You gave a nice summary of potential improvements which can help with block propagation times. I expand a bit on that.
- Reduced relay-filtering. The objective of seemingly minimizing the chance of a transaction being mined by denying its relay is simply not compatible with the aim of fast propagation of any consensus-valid transaction. This has been extensively debated here and elsewhere.
- Recent proposals for next-block-template synchronizations (could help a lot)
- Faster block validation
- Faster UTXO, mempool state update, template creation – I suppose Cluster mempool can be beneficial here
Anything else to mentioning?
I think it’s also worthwhile to mention that only a small proportion of bitcoin nodes are mining (i.e. generating block templates used by miners), and only these few nodes are directly motivated in lower block propagation times. There are some things such a node can control, such as good peering, large mempools, no filtering, etc.). Most non-mining nodes are not much hindered by block delays (except e.g. node with high transaction generation). However, global transaction propagation is an emergent property of the whole network, and requires cooperation from the non-mining nodes as well.
On a related note, more miner-generated templates (as opposed to pool-generated) helps reduce mining centralization, but may have a drawback of affecting negatively block propagation times, because the set of block-creator nodes becomes larger and more diverse. (Most Ocean-mined blocks are with custom template even today, and with the long-awaited Stratum V2 adoption this may get more widespread).