Cluster processing overview
So far we’ve thought of the process of processing clusters as:
- Run a linearization algorithm on the cluster (outputting a valid, and hopefully decent, topological ordering for its transactions).
- The linearization algorithm internally repeatedly invokes a “find high-feerate topologically-valid subset” algorithm (which could be ancestor set feerate based, or an exponential search algorithm), adding its results to the output.
- Then run an \mathcal{O}(n) chunking algorithm, which partitions the linearization into chunks that can be included in blocks at once. Intuitively: start with every transaction in its own chunk, and then repeatedly merge subsequent chunks where the second one has higher feerate than the first until no such pairs are left. In more detail, the algorithm is:
- Start with an empty list of chunks.
- For each transaction tx in the linearization:
- Create a new chunk at the end of the list with just tx.
- While there are at least two chunks, and the last one has higher feerate than the one before it:
- merge the two chunks.
Since at most n-1 merge operations can take place, the inner while loop can run at most 2n-1 times, making this \mathcal{O}(n).
Chunking can go bad
In the context of per-chunk package RBF, it is becoming more important that the chunks that come out are “sane”. We’d expect that with a decent linearization algorithm choice, this would be the case, but it turns out it’s not too hard to find bizarre examples.
All transactions are the same size, number is feerate:
graph BT
T2["A: 8"];
T4["B: 1"];
T1["C: 13"];
T3["D: 13"];
T0["E: 1"];
T0 --> T1 --> T4;
T0 --> T2;
T0 --> T3 --> T4;
Ancestor-set based linearization gives order [A,B,C,D,E]. The chunking algorithm turns this into [ABCD,E]. However, ABCD consists of two disconnected components (A and BCD). Of course the optimal order is [B,C,D,A,E] which would get chunked as [BCD,A,E], but ancestor-set linearization does not discover this.
In the context of RBF policy it is probably undesirable that A and BCD would get processed simultaneously.
A better chunking algorithm
To address this, it is possible to amend the chunking algorithm so that it can guarantee that the resulting chunks are always connected. Instead of merging two chunks when the second one has higher feerate, only do that if there is a dependency between the two chunks. If not, swap the two chunks. In more detail:
- Start with an empty list of chunks.
- For each transaction tx in the linearization:
- Create a new chunk with just tx in it at the end of the list.
- Make the variable work point to this newly created chunk.
- While work is not the first chunk in the list, and work has higher feerate than the chunk before it:
- If work depends on its predecessor:
- Merge work and its predecessor in one chunk
- Continue with work pointing to this merged chunk.
- If work does not depend on its predecessor:
- Swap the positions of the work chunk and its precessor.
- Continue with work pointing to the same (now moved) chunk.
- If work depends on its predecessor:
This can perform \frac{n(n-1)}{2} swap operations, which makes the algorithm have complexity \mathcal{O}(n^2). If applied to an already-optimal linearization however, no swaps will take place, making the result \mathcal{O}(n). Based on fuzz-based searching, it appears possible that the number of swaps can remain quadratic for ancestor-based linearization.
To see why this is correct, observe that this never merges disconnected components, so that alone guarantees that the result consists of connected chunks. For it to be a valid chunking, it also needs to have monotonically decreasing feerates. This is a loop invariant: if we assume this is true before a new tx is processed, it’s only every untrue for the work variable, which gets resolved before continuing with the next transaction.
This algorithm does more than just preventing connected chunks though; it swaps things, possibly making the resulting feerate diagram actually better. In fact, it always makes the feerate diagram strictly better everywhere, or doesn’t affect it at all. It never worsens the diagram, or result in something incomparable. The cluster depicted above is also the simplest example (that I can find) for which this new algorithm improves the result.
Despite that, the quality of the input linearization still matters. For example:
graph BT
A["A: 1"];
B["B: 2"];
C["C: 9"];
B --> A;
C --> A;
The optimal chunking (and one which ancestor-based linearization plus naive chunking would find) is [AC,B]. However, if the (terrible) linearization [A,B,C] were to be given to this fancier algorithm, it would just result in the single chunk [ABC]. This is because after merging [AB] with [C], it doesn’t consider splitting off [B] again. As a result, we cannot use this algorithm as a replacement for linearization - doing so would sometimes result in worse chunkings than ancestor-based would.
As a postprocessing step
This algorithm can be considered as an improved version of the chunking algorithm, but it can also be conceived of as a post-processing algorithm for linearization. In this case it would operate on a list of lists of transactions rather than a list of chunks, but otherwise it remains the same. This has the advantage of outputting a linearization that the naive chunking algorithm (with lower complexity) can turn into the correct chunks again. In some cases it may be useful to be able to perform operations on the linearization that do not need a full relinearization, but only a rechunking.
Given that even ancestor-set based linearization is quadratic, having a (fast) postprocessing step for it with the same complexity is probably not too bad. Furthermore, re-running this postprocessing step on the output does only require linear time, so it may be desirable to remember the post-processed linearization.
When used in this manner, the algorithm also improves the linearization at a sub-chunk level, which may be useful when packing the very end of a block. There are other solutions too, though.