Stats on compact block reconstructions

0xB10C · February 4, 2025, 2:03pm

I’ve started recording the contents of inbound and outbound getblocktxn messages a week ago. This should allow for some insights into “are peers often missing the same transactions?” and “can we pre-fill the transactions we had to request our self?”. I haven’t taken a closer look at the data yet.

Also, I’ve changed one of my nodes to run with blockreconstructionextratxn=10000 and updated two nodes to a master that includes p2p: track and use all potential peers for orphan resolution #31397. Probably need to wait until the mempool fills up again to see the effects of this.

ajtowns · February 5, 2025, 4:51am

One other thing is that FIBRE is designed for UDP transmission to avoid delays due to retransmits; so redoing it over TCP via the existing p2p network would be a pretty big loss…

0xB10C · March 24, 2025, 7:20pm

I’ve changed node alice to run with blockreconstructionextratxn=10000 early February. This had a noticeable effect the following days with slightly higher scores starting 2025-02-06. During the increased mempool activity between 2025-02-21 and 2025-03-06 it performed significantly better than my other nodes.
Node charlie and node erin were switched to a branch that includes p2p: track and use all potential peers for orphan resolution #31397 at the same time in early February. I don’t see any immediate improvement for these two nodes.
Node ian was running Bitcoin Core v26.1 until I switched all nodes to run v29.0rc1 release candidate. ian clearly performed worse than the other nodes before the update, which is expected as e.g. mempoolfullrbf wasn’t default in v26.1.
Node mike doesn’t allow inbound connections (while the other nodes do and usually have full inbound slots). This is noticeable in the reconstruction performance. Only having eight peers that inform mike about transactions is probably likely worse than having close to 100 peers that inform you about new transactions.

The stats from alice, charlie, and erin could indicate that orphans aren’t the problem, but conflicts, replacements, and policy invalid transactions (i.e. extra pool txns) cause low performance during high mempool activity. Although, I’m not sure if these three moths of data are enough to be certain yet.

I’ve started to look at the data I’ve been recording. It seems that many of my peers I announced a compact block to end up requesting very different sets of transactions (and usually larger sets) than I request from my peers. I assume many of them might be non-listening nodes like mike or run with a non-default configuration. This needs more work, but I hope to post more stats on requested transactions here at some point.

I’ve also noticed that my listening nodes running with the default configuration often independently request similar sets of transactions. This seems promising in regards to predictably prefilling transactions in our compact block announcements. My assumption would be that if we prefill:

transactions we had to request
transactions we took from our extra pool
and prefilled transactions we didn’t have in our mempool (i.e. prefilled txns that were announced to use and ended up being useful)

we can improve the propagation time among nodes that accept inbound connections and use a “Bitcoin Core” default policy. This in turn should improve block propagation time of the complete network as now more nodes know about the block earlier. Additionally, useful prefilled transactions don’t end up wasting bandwidth, only transactions that a peer already knew about would waste bandwidth. These improvements would probably be most noticeable during high mempool activity: the (main) goal wouldn’t be to bring the days with 93% (of reconstructions not needing to request a transaction) to 98% but rather the days with 45% to something like 90% for well-connected nodes.

Since Bitcoin Core only high-bandwidth/fast announces compact blocks to peers that specifically requested it from us (because we quickly gave them new blocks in the past), non-listening nodes that are badly connected won’t start sending wasteful announcements with many prefilled, well-known transactions to their peers.

I’ve started implementing this in 2025-03-prefill-compactblocks but its still work-in-progress:

limit the prefill amount to something like 10kB worth of transactions as per BIP152 implementation note #5. I think this is useful to avoid wasting too much bandwidth if a node does a high-bandwidth announcement but, for some reason, prefills a lot of well-known transactions in the announcement
cmpctblock debug logging on wasted bandwidth: log the number of bytes of transactions we already knew about when receiving a prefilled compact block. This can be tracked/monitored to determine if were wasting too much bandwidth by prefilling
since the positive effect on the network is only measurable with a wide(r) deployment of the prefilling patch, it’s probably worthwhile to do some Warnet simulations on this and test the improvement under different scenarios.

andrewtoth · April 8, 2025, 1:50pm

I’m not sure I understand why we would want to also prefill txs from our extra pool. The logic for extra pool inclusion would be the same for all nodes. So if we consider that our peers would have the same txs in their mempool then logically we would consider that our peers would have the same txs in their extra pool, no?

0xB10C · April 8, 2025, 4:06pm

Yeah, good question. I don’t have data on this yet, but I think it makes sense to look at the extra_pools of nodes and see if they are similar or different. My assumption is that they aren’t too similar.

So if we consider that our peers would have the same txs in their mempool then logically we would consider that our peers would have the same txs in their extra pool, no?

A few arguments against extra pool similarity are:

the extra pool is quite small with only 100 transactions in it by default
mempool transactions are relayed with the hope that mempools converge, extra pool transactions are stopped at their first hop and aren’t relayed
you might have peer that is sending a lot of transactions you’ll reject and put into your extra pool, but I might not have a connection to this peer - our extra pools will be quite different

andrewtoth · April 8, 2025, 4:43pm

But our peers will also have the same default.

RBF replaced txs are put into the extra pool, and the replacing tx is still relayed. So they should converge. If we are going to search the orphanage anyways, we can stop putting orphans into the extra pool.

Would it be likely a miner will mine these rejected txs though? Not sure.

Crypt-iQ · April 22, 2025, 6:47pm

One point brought up by sipa here in a semi-related thread ([WIP] p2p: Add random txn's from mempool to GETBLOCKTXN by davidgumberg · Pull Request #27086 · bitcoin/bitcoin · GitHub) is that the number of TCP packets sent over could increase if we’re making the CMPCTBLOCK message larger with prefilledtxns. I think that is maybe one downside to prefilling transactions. Perhaps it’s possible to prefill transactions up to a certain total message size limit specifically for compact blocks?

EDIT: His point was actually about the GETBLOCKTXN causing more round trips, but the same thing applies.

davidgumberg · May 21, 2025, 12:47am

0xB10C/2025-03-prefill-compactblocks is very interesting,

since the positive effect on the network is only measurable with a wide(r) deployment of the prefilling patch, it’s probably worthwhile to do some Warnet simulations on this and test the improvement under different scenarios.

I think one low effort way to perform a limited test of this patch on mainnet is to run a second node which only listens to CMPCTBLOCK announcements from manually-connected peers, and is manually connected to a 0xB10C/2025-03-prefill-compactblocks node. I’ve created a branch to try this: davidgumberg/5-20-25-cmpct-manual-only, I’ll try to run an experiment soon with two nodes.

My assumption would be that if we prefill:

transactions we had to request

transactions we took from our extra pool

prefilled transactions we didn’t have in our mempool (i.e. prefilled txns that were announced to use and ended up being useful)

I think the privacy concerns raised in bitcoin/bitcoin#27086, are relevant here, how can a node avoid:

Providing a unique fingerprint by revealing its exact mempool policy in CMPCTBLOCK announcements.
Revealing all of the non-standard transactions that belong to it by failing to include them in it’s prefill.

2. is more severe, and may be part of a class of problems (mempool’s special treatment of it’s own transactions) that is susceptible to a general fix outside of the scope of compact block prefill. Even if it’s impossible or infeasible to close all leaks of what’s in your mempool, it would be good to solve this.

One way of fixing this might be to add another instantation of the mempool data structure (CTxMempool), maybe called m_user_pool. Most of the code could go unchanged except for where it is desirable to give special treatment to user transactions, and these cases could be handled explicitly.

To solve 1., I wonder if there is a reasonably performant way to shift the prefills in the direction of prefilled transactions the node wouldn’t have included according to default mempool policy. This is not just for privacy, as I imagine this is the ideal set of transactions to include, strict mempools prefilling too much, and loose mempools prefilling too little.^[1] If this would be too expensive to compute on CMPCTBLOCK receipt, maybe a variation of m_user_pool is possible, where a node maintains another CTxMempool instance for all the transactions which default mempool policy would have excluded, but user supplied arguments have permitted. Or maybe the extra state is too expensive/complicated, and instead just performing an extra standardness check with the default policy on tx receipt and setting a flag on the tx (or keeping a map of flagged tx’es) is enough.

Maybe all of this is too complicated to implement proportional to its value here, but these could also be steps toward solving mempool fingerprinting more generally.^[2]

the number of TCP packets sent over could increase if we’re making the CMPCTBLOCK message larger with prefilledtxns.

I am not very knowledgeable about TCP, but as I understand RFC 5681, the issue is not a message growing to a size where it has to be split across multiple packets/segments, but a message that grows too big to fit in the receiver-advertised message window (rwnd) and the RFC 5681 (or other congestion control algorithm) specified congestion window. (cwnd). The smallest of these two (cwnd and rwnd) is the largest amount of data that can be transmitted in a single TCP round trip, it should be possible to get the relevant metrics for this from the tcp_info structure on *nix systems^[3] doing something like:

struct tcp_info info;
socklen_t info_len = sizeof(info);
getsockopt(sockfd, IPPROTO_TCP, TCP_INFO, &info, &info_len)

// congestion send window (# of segments) * mss (max segment size)
uint32_t cwnd_bytes = info.tcpi_snd_cwnd * info.tcpi_snd_mss;
// our peer's advertised receive window in bytes
uint32_t peer_rwnd_bytes = info.tcpi_snd_wnd;
// get the smaller one
uint32_t max_bytes_per_round_trip = cwnd_bytes < peer_rwnd_bytes ? cwnd_bytes : peer_rwnd_bytes;

And the announcer could pack the prefill until it hits this limit. I am not sure how likely it is that that constraining messages to this size would deter a second round trip from taking place, but it seems like a reasonable starting point.

For better or for worse, such an approach would disadvantage nodes with stricter-than-default mempools in compact block reconstruction. ↩︎
But maybe no general solution to mempool fingerprinting is possible, and nodes with non-default mempools shouldn’t have any expectation that they can’t be fingerprinted. ↩︎
Linux, Mac, FreeBSD It seems something similar on Windows is possible with SIO_TCP_INFO ↩︎

gmaxwell · May 21, 2025, 8:21am

Prefilling is just a flawed part of the design, it was kinda tossed in because it was very easy to add and harmless if not used. After compact blocks were deployed I did a bunch of testing and was unable to make it do anything but harm.

The issues it has are several fold: it’s part of the compact block message so it blocks reception of the compact block in cases where it wasn’t needed. Peers also get compact blocks from multiple sources and so if they all use prefill then you waste N fold the bandwidth (or N-1 if one was indeed helpful). And then of course the extra data stuffs you further back into needing RTTs, thanks to window issues.

Then of course you have the issue that many missed transactions are missed because they were too large, which makes all the above issues much worse.

Fiber being AGPL is a non-issue, parts could be re-licensed if needed. It has in it solutions to every one of the issues raised above-- including the ability for extra data to be sent that helps even if the prediction of what was missed wasn’t accurate, allowing data from multiple peers to all contribute, and so on.

The use of UDP however, needed get around the TCP window issues, would probably be challenging for widespread deployment due to the need for hole punching.

A lot of thing have happened since then, core has minisketch merged (though unused), and using that kind of tool I was able to get blocks in consistently 800-ish bytes before. A big reduction in compact block size would leave a lot of room for data to fill in missing transactions.

But if miners are regularly including hundreds of kilobytes that were never relayed I’m a bit dubious that any scheme is going to result in particularly good performance except between peers with extremely high dedicated bandwidth that can do manual congestion management (e.g. a fiber like deployment of geographically dispersed data center nodes). Though the fact that it can help even if just some nodes run something faster is helpful-- it makes development of stuff more interesting even if there isn’t a serious deployment story.

davidgumberg · May 22, 2025, 2:55am

The issues it has are several fold: it’s part of the compact block message so it blocks reception of the compact block in cases where it wasn’t needed. Peers also get compact blocks from multiple sources and so if they all use prefill then you waste N fold the bandwidth (or N-1 if one was indeed helpful). And then of course the extra data stuffs you further back into needing RTTs, thanks to window issues.

Then of course you have the issue that many missed transactions are missed because they were too large, which makes all the above issues much worse.

I agree that in the extreme case, prefilling will not be helpful. But I’m optimistic that prefilling up to the TCP congestion window (no extra RTT) is not harmful. It seems reasonable to presume that, in general, a node’s operating system’s congestion control algorithm will reliably predict the maximum message that can be sent to a peer without incurring an extra round trip, and nodes with slow connections will tend to also have small windows, mitigating the redundant prefill cost. If it works as I understand, it seems like using the cwnd will scale nicely up and down with connection speeds, and offloads the engineering burden of this problem to kernel developers and the IETF.

It seems worth measuring what the typical sizes of compact block BLOCKTXN fulfillments are. I’ve made a branch that might help with this: (log: Additional compact block logging by davidgumberg · Pull Request #32582 · bitcoin/bitcoin · GitHub). It would also be useful to have some data on bitcoin node congestion windows sizes, and if these are close to each other in size, compact block reconstruction failures don’t go away, but conservatively prefilling might make them less frequent while incurring little additional cost.

A lot of thing have happened since then, core has minisketch merged (though unused), and using that kind of tool I was able to get blocks in consistently 800-ish bytes before. A big reduction in compact block size would leave a lot of room for data to fill in missing transactions.

Great idea, I see that on my node compact block messages hover around ~20kB, 800 bytes would leave a lot more overhead for prefills!

Crypt-iQ · May 30, 2025, 12:32pm

I am not sure whether the comment in PR 27086 I linked is referring to congestion issues or IPv4 fragmentation issues. I don’t have hard data, but I believe both contribute to latency issues here and sending data >> MTU (~1500 bytes) is going to lead to lots of fragmentation. Two links if you have the time:

I’m not really sure that pre-filling above MTU is worth it after reading the two above RFCs, but curious to hear thoughts.

EDIT: Sorry to cross-post, but I’ve TLDR’d the above two RFC’s in a related Lightning conversation here: Latency and Privacy in Lightning - #13 by Crypt-iQ

I think I’ve actually conflated IP reassembly with TCP reassembly. I think maybe hard data would be nice to have here?

gmaxwell · May 31, 2025, 4:14pm

There is no IP fragmentation involved in TCP transmissions (well, assuming PMTUD did its thing)… indeed, you’re conflating IP reassembly with TCP reassembly.