Spanning-forest cluster linearization

sipa · May 23, 2025, 6:19pm

Solved!

The idea is that equal-feerate chunks can be split using the existing algorithm, by perturbing the transaction feerates slightly. If we pick an arbitrary transaction in a chunk, and:

give it an infinitesimally higher feerate than it really has, then if a way to split the chunk exists such that this transaction is in the top part, it will be found using the normal split/merge algorithm
give a transaction an infinitesimally smaller feerate than it really has, then if a way to split the chunk exists such that this transaction is in the bottom part, it will be found using the normal split/merge algorithm.

If we try both the infinitesimally-higher and infinitesimally-smaller approaches for the same transaction, then if a split exists it must be found, as this transaction must appear in either the top or the bottom of the split.

We do not actually need to modify the transaction feerates to implement this. Instead, for increased feerate we can look for q=0 dependencies where the modified transaction is in the top part, as q(a \cup \epsilon, b) = q(a,b) + q(\epsilon,b) = q(\epsilon,b) > 0, where \epsilon is an imaginary transaction with very high feerate, but infinitesimal size. Similarly, for decreased feerate we can look for q=0 dependencies with the modified transaction in the bottom. If the chunks are already optimal otherwise, the resulting split components cannot merge with any other chunk, so we only need to attempt self-merges, and do not need a full merge sequence.

More concretely, given a chunk:

Pick an arbitrary transaction t in it.
Loop, trying to split the chunk with t in the top set:
- Find an active dependency d_1 in it with q \geq 0 (but note that q > 0 is not possible anymore if the normal optimization process has finished), and which has t in its top set. If no such dependency is found, stop.
- Deactivate d_1.
- Find an inactive dependency d_2 whose child is in d_1's top and whose parent is in d_2's bottom. If none can be found, recurse into minimizing the created top and bottom chunk.
- Activate d_2.
Loop, trying to split the chunk with t in the bottom set:
- Find an active dependency d_1 in it with q \geq 0 (but note that q > 0 is not possible anymore if the normal optimization process has finished), and which has t in its bottom set. If no such dependency is found, stop.
- Deactivate d_1.
- Find an inactive dependency d_2 whose child is in d_1's top and whose parent is in d_2's bottom. If none can be found, recurse into minimizing the created top and bottom chunk.
- Activate d_2.

I have implemented this in the Bitcoin Core PR for adding SFL.

When running this on the replayed 2023 data, it appears that out of 24693846 clusters with 2 or more transactions, 136709 of them (0.55%) gain one or more chunks by running this chunk minimization procedure (at least for some random seeds), adding on average 1.58 chunks per affected cluster, or 0.0087 chunks per cluster over the whole dataset.