How to linearize your cluster

sipa · April 18, 2025, 12:39am

There are multiple ways of combining things, but that is not how it’s currently implemented. The important part is that in practice, we have a linearization for every cluster at all times, and want to exploit that knowledge. So the input linearization to LIMO/CSS is the earlier, existing, linearization for the cluster, not a pure ancestor sort based linearization. This input linearization can be great (if it was a hard cluster that was linearized before, and only a trivial change made to it), but it may also be terrible (say two clusters got merged, so we combine their linearizations naively, but the transactions interact a lot).

The ancestor sort mixing-in happens inside, which means that each candidate set search can be bootstrapped from it, and LIMO’d with it.

The LIMO/CSS as implemented now, is roughly:

Linearize(cluster, input_lin):
- While cluster is not empty:
  - Find anc, the best ancestor set in cluster
  - Find pre, the best prefix of input_lin in what remains of cluster.
  - Set best = higher feerate of anc and pre
  - Update best = CSS(cluster, best), starting with best as initial candidate.
  - Update best to be the highest-feerate prefix of the intersection between best and input_lin (this is the LIMO step, guaranteeing that the result will be as good as input_lin).
  - Output best, and remove it from cluster

This mixing/bootstrapping with ancestor sets in addition to existing prefixes doesn’t actually guarantee anything (in particular, it does not guarantee that the output is as good as pure ancestor set sort, though in practice it’ll usually beat it easily), but it’s a relatively fast way to have a good initial thing to start with, in particular when the input linearization is terrible.

And it’s this incremental mixing in of various sources of candidate sets (whether that’s ancestor sets or CSS or something else) that I don’t know how to map to SFL. It can start with an input linearization, but I think it’s more important to use the existing cluster linearization for that. However, unless we really think that this ancestor-set mixing has some specifically useful properties, I don’t think we care. The normal SFL steps seem to make the output linearization very quickly better, and in practice, optimal within 10s of microseconds.