How to linearize your cluster

sipa · April 17, 2025, 8:33pm

I’m beginning to think that the spanning-forest linearization (SFL) algorithm is a better choice in general than the min-cut GGT algorithm, because while asymptotic complexity is worse (it can in theory run forever, though randomization makes this unlikely), it’s actually a lot more practical. It’s of course possible to combine the two, e.g., use GGT just for linearizing very hard clusters in a background thread, but it’ll practically be barely used I expect.

The reasons for choosing SFL over GGT and CSS (candidate set search, the old exponential algorithm which this thread was originally about) are:

Anytime algorithm: it is possible to interrupt SFL at any point, and get a linearization out. The longer it runs, the better the result gets. This is also true for CSS, but less so for GGT as any min-cut that has not been computed completely may not result in an improvement (and the entire runtime may consist of just a single cut).
Improving existing linearizations: it is possible to initialize SFL with an existing linearization as input, and any output it gives will be a linearization that’s better or equal. In some, but not all, practical cases it makes the algorithm also find the optimal faster when provided with a good input linearization already. This avoids the need to use the linearization merging algorithm to combine with existing linearization, or the LIMO algorithm to combine candidate finding with improvement of an existing linearization.
Load balancing: both GGT and CSS subdivide the problem in different ways. When trying to turn them into anytime algorithms, there is a question of how to subdivide the computation budget over these subproblems. In contrast, SFL always just operates on a single state which it keeps improving, so no such budgetting decisions are necessary.
Fairness: Related to the previous point, it is easy to make SFL spread its work over all transactions in the cluster in a more-or-less fair way, for example by trying going through the transactions in a round-robin fashion to find dependencies to improve. GGT works by subdividing the cluster in a divide-and-conquer manner, but choices need to be made about which part to attempt to subdivide next. CSS fundamentally works from highest-feerate to lowest-feerate chunks, so if time runs after only some part is done, the (transactions in) the first chunk will have had more effort on them. I think that’s concerning, because it risks an attacker attaching a very complex set of transactions to an honest CPFP or so, and have the algorithm spend all its time on the attacker’s transactions, running out before even considering the steps necessary to recognize the CPFP.
Runtime: when benchmarking using replayed mempool data of previous years (credit to @sdaftuar), my SFL implementation seems somewhat (~2x) faster than the GGT one.
Integer arithmetic: GGT needs to represent flows and feerates as approximate integers, which need to be scaled such that they are sufficiently precise. If F is the sum of the (absolute values of) fees in the graph and S the sum of the sizes in the graph, then GGT needs integers large enough to represent 4S^2F if exactly optimal solutions are desired. SFL and CSS only need integers large enough to represent 2SF and SF. Of course, perhaps we don’t care about exactness of solutions for huge clusters that pay enormous feerates, and in either case, this isn’t a very hard problem, just complicates the implementation somewhat.

The downsides of SFL compared to GGT are:

Complexity bounds: ~~there is no known bound for the runtime of SFL, and it may be infinite~~ SFL may in theory run forever, while GGT is proven to be \mathcal{O}(n^3). However, even that cubic runtime is probably too much in the worst case to run at transaction relay time. Also note that we don’t really need optimal linearization. They’re nice, because they’re obviously the most incentive-compatible ones, but in practice, the bar is really a “good enough for what people actually use”, which is hard to define, and interacts in difficult ways with adversial models (see fairness above).

The downsides of SFL compared to CSS are:

Mix with ancestor sort: CSS, when used through LIMO, can include arbitrary additional sources for finding good topological sets to move to the front. In the current implementation, optimal ancestor sets are used as additional inputs. There are no well-defined properties that this gives the linearization, but at least intuitively, this provides a “linearization always has at least ancestor-set like quality” property, regardless of how much time was spent in candidate finding. I don’t know how to do anything like this in SFL (nor in GGT). If we believe that this quality is important, despite not being well-defined, then switching to SFL in theory means we could lose it in clusters that cannot be linearized optimally in time. My thinking is that combined with the SFL fairness, and the fact that it is just so much faster overall, it is unlikely to be an issue. Further, if we really wanted ancestor-sort quality, it would be possible to compute, in addition to running SFL, an ancestor sort linearization from scratch, and then merge the two if SFL doesn’t find an optimal result. However, I suspect that the time this would take is better spent on performing more SFL iterations.
Minimal chunks: CSS finds a linearization by actively searching for good chunks to move to the front, and in doing so, it can prefer smaller chunks over bigger ones. This guarantees that (if it completes) it will find the linearization with minimal possible chunks among all feerate-diagram-optimal linearizations. SFL (and GGT) on the other hand really just find groups of connected transactions with equal chunk feerate, and thus can’t guarantee to pull equal-feerate chunks apart. EDIT: with an extra step, SFL can find the minimal chunks.
Complexity bounds: SFL may in theory run forever, while CSS is trivially bounded by \mathcal{O}(n \cdot 2^n), although I believe it may be \mathcal{O}(n \cdot \sqrt{2^n}) instead. These bounds are obviously impractical.

All 3 above issues are open questions, and solutions/improvements could be found to them.

In table form:

Trait	CSS	SFL	GGT
Proven worst-case	\mathcal{O}(n \cdot 2^n)	\mathcal{O}(\infty)	\mathcal{O}(n^3)
Conjectured worst complexity	\mathcal{O}(n \cdot \sqrt{2^n})	\mathcal{O}(\infty)	\mathcal{O}(n^3)
Historical avg. runtime (µs) (n \leq 64)	80	10	22
Historical worst runtime (µs) (n \leq 64)	31500	26	33
Extrapolated worst runtime (µs) (n \leq 64)	700,000,000	\infty	10,000
Anytime algorithm	Needs budgeting	Natively	May lose progress
Improving existing	Through LIMO	Natively	Merging afterwards
Fairness	Hard	Easy	Hard
Integer sizes	SF (\times,<)	2SF (\times,<,-)	4S^2F (\times,/,<,+,-)
Ancestor sort mix	Yes	No	No
Minimal chunks	Natively	Extra step	No