Merging incomparable linearizations

sipa · November 19, 2023, 4:12am

Introduction

While we have several ways for computing good linearizations for a cluster from scratch, sometimes we don’t start from scratch. We may have our own linearization already, but receive (through so far unspecified means) another linearization from a peer. If it’s strictly better, we could just switch to it. But what if our linearization is better in some places, and theirs is better in other places? Can we somehow combine the “smarts” that they’re based on, to construct an even better linearization?

As a reminder, we compare linearizations by computing all cumulative (size, fee) points after every chunk, and connecting them by straight lines. Each linearization has such a segmented line, which we call the fee-size diagram. If a linearization A has a diagram that is nowhere below that of linearization B, and at least in some places above it, we say A is strictly better than B. If the diagrams coincide everywhere, they’re equal. If one diagram is on top in some place(s) and the other is on top in others, we call them incomparable.

Due to the (~~so far unproven, but accepted~~ EDIT: now proven, thanks to this thread) property that every cluster has a well-defined non-empty set of optimal linearizations (which are all equal to each other, and all strictly better than all other linearizations), it must be the case that if two incomparable linearizations exist, there must exist at least one linearization that’s strictly better than both. This topic is about finding such combined linearizations.

Algorithms

Best-chunk merging

We’ve known a simple merging algorithm for a while:

Given two linearizations L₁ and L₂:
- While not all transactions have been processed:
  - Find highest-feerate prefix P₁ among all unprocessed transactions in L₁.
  - Find highest-feerate prefix P₂ among all unprocessed transactions in L₂.
  - Include the transactions from the higher-feerate of P₁ and P₂ in output linearization.
  - Mark the included transactions as processed

This algorithm can have \mathcal{O}(n^2) runtime in the number of transactions, because there can be n iterations, and each can require \mathcal{O}(n) work to find the highest-feerate prefix. In \mathcal{O}(n) time we can of course also compute the full chunking of L₁ and L₂ (rather than just the first chunk), and it may be possible to reuse part of that computation across iterations; that may enable a lower complexity, but it seems nontrivial. And given the fact that in every case we’ll probably want to run at least an ancestor-set based linearization from scratch ourselves (which is also \mathcal{O}(n^2)), it’s probably fine to target the same complexity for a merging algorithm.

Sadly, this algorithm doesn’t always produce an output that’s better or equal than both inputs. It will produce a linearization whose diagram is at no point below the lowest of the two input diagrams, but that’s not a particularly high bar. We could instead just stick with either one of the inputs instead to achieve that level of quality. In typical cases it’ll be better of course, but there are no guarantees. And it is easy to find examples where the result is just one of the inputs, and thus still incomparable to the other one.

See the example below (labels are fee/size):

graph BT
   T0["A: 1/3"];
   T1["B: 2/1"];
   T2["C: 2/1"] --> T0;
   T3["D: 7/1"] --> T0;
   T4["E: 6/1"] --> T0;
   T4 --> T1;
   T5["F: 7/3"] --> T1;

The first input linearization is the ancestor set sort: [B,F,A,D,E,C], which is chunked as [BFADE,C].
The second input is [B,A,E,C,D,F], which is chunked as [BAECD,F]. The BAECD chunk has higher feerate (18/7=2.571) than BFADE (23/9=2.556), but during the F part it is overtaken by BFADE.
The result of merging is just the second input again.
The optimal linearization would be [B,A,D,E,F,C], chunked as [BADE,F,C].

What we observe is that there is actually a common subset (BADE) of the two initial chunks that can be moved to the front, but the merging algorithm does not consider this.

Intersection merging

In an attempt to address that, let’s add a step to the merging algorithm to consider intersections:

Given two linearizations L₁ and L₂:
- While not all transactions have been processed:
  - Find highest-feerate prefix P₁ among all unprocessed transactions in L₁.
  - Find highest-feerate prefix P₂ among all unprocessed transactions in L₂.
  - Let P₃ be the intersection of P₁ and P₂. This is necessarily topologically valid.
  - Include the transactions from the highest-feerate of P₁, P₂, and P₃ in output linearization.
  - Mark the included transactions as processed

While it adds a step, the complexity is unchanged. Unfortunately, it still doesn’t always result in a better linearization:

graph BT
   T0["D: 1"] 
   T1["F: 11"];
   T2["G: 22"];
   T3["A: 1"];
   T4["E: 21"];
   T5["C: 20"];
   T6["B: 13"];
   T2 --> T0;
   T2 --> T4;
   T1 --> T5 --> T3;
   T4 --> T3;
   T4 --> T6;

The first input linearization is the ancestor set sort: [B,A,E,D,G,C,F], which is chunked as [B,AEDGC,F].
The second input is [B,A,C,F,E,D,G], which is chunked as [BACFE,DG]. The BACFE chunk has higher feerate (66/5=13.5) than B (13) initially, but gets overtaken by the AEDGC chunk.
The result of merging is equal to the second input.
The optimal linearization would be [B,A,C,E,D,G,F], chunked as [BACE,DG,F].

Again the crux is discovering an intersection (BACE), but this time between the BACFE chunk and not one but two chunks of the other input (B and AEDGC).

Prefix-intersection merging

The solution is to attempt more intersections. Observe that a linearization is really a way of constraining the search for subsets to just prefixes of the linearization. Given the intuition gained above that incomparabilities always seem to be due to a non-considered intersection between the two linearizations, it seems worthwhile try all intersections between prefixes of the first with prefixes of the second linearization. There can be a quadratic number of such intersections however, but maybe we can limit ourselves to just intersections that involve the best chunk of one of both linearizations at least:

Given two linearizations L₁ and L₂:
- While not all transactions have been processed:
  - Find highest-feerate prefix P₁ among all unprocessed transactions in L₁.
  - Find highest-feerate prefix P₂ among all unprocessed transactions in L₂.
  - Find the highest-feerate set among all these:
    - Intersections between P₁ and all prefixes of L₂.
    - Intersections between P₂ and all prefixes of L₁.
  - Include the transactions from that set in the output linearization.
  - Mark the included transactions as processed

The various intersections between P₁ and prefixes of L₂ can be computed incrementally (keep adding transactions from L₂ if they’re in P₁, and remember the best one), and similarly for P₂ with prefixes of L₁. This, like finding the P_i in the first place, can be done in \mathcal{O}(n) time. The result is still an \mathcal{O}(n^2) algorithm.

Surprisingly, this algorithm seems powerful enough to always find a linearization that’s strictly better than both inputs if they’re incomparable (and at least as good as the best of the two if they are comparable). This works regardless of the quality of the input linearizations (e.g. they don’t need to be ancestor sort or better), and does not require connected chunks (see linearization post-processing). No proof, though.

Update: simpler and proven merging

See the discussion further in this thread (thanks, @ajtowns).

It appears that it suffices to only consider the intersections between the higher-feerate out of P_1 and P_2, with all prefixes of the linearization of the other input:

Given two linearizations L_1 and L_2:
- While not all transactions have been processed:
  - Find highest-feerate prefix P₁ among all unprocessed transactions in L₁.
  - Find highest-feerate prefix P₂ among all unprocessed transactions in L₂.
  - If P_1 has lower feerate than P_2, swap P_1 with P_2 and L_1 with L_2.
  - Find the highest-feerate set among all intersections between P_1 and the prefixes of L_1.
  - Include the transactions from that set in the output linearization.
  - Mark the included transactions as processed

Instead of describing the resulting set as an intersection, it can also be seen as the highest-feerate prefix of P_2, reordered according to the order these transactions have in L_1.

A proof for this scheme can be found in this thread.

ajtowns · November 20, 2023, 9:46am

One way of characterising a chunk is by listing its childless-descendants; for BACFE, those would be E and F. But in this case F’s feerate alone is 11, while BACFE’s is 13.2; which gives you an easy clue that splitting that chunk into [BACE,F] would be an improvement.

I think you could extend this comparison to create a compatible total order just by saying “given two diagrams, d_1 and d_2, then if x_0 is the earliest point where d_1(x_0) \ne d_2(x_0), then d_1 > d_2 iff d_1(x_0) > d_2(x_0)” ?

I think “prefix-intersection merging” then guarantees to produce a linearisation L_3 such that L_3 \ge L_1 and L_3 \ge L_2 according to that total order. I don’t think it guarantees that L_3 will be comparable to L_1 or L_2 according to the original partial order, but I think you’d need a fairly complicated cluster for that to actually occur.

sipa · November 20, 2023, 10:46am

Prefix-intersection merging on L_1 and L_2 gives a linearization L_3 such that L_3 \geq L_1 and L_3 \geq L_2, according to the usual preorder on linearizations. If additionally L_1 and L_2 were incomparable (i.e., L_1 \ngeq L_2 and L_1 \nleq L_2), then L_3 > L_1 and L_3> L_2.

No proof, but I’ve spent a decent amount of CPU time (weeks) on trying to find counterexamples.

Under your total ordering it holds for any of these merge algorithms (including best chunk merging) that L_3 \geq L_1 and L_3 \geq L_2. The surprising part is that prefix-intersection merging seems powerful enough to achieve that under the preorder.

sdaftuar · November 20, 2023, 5:32pm

I haven’t processed this whole post yet, but just wanted to add an aside/reminder for the sake of our intuition: even if a linearization is strictly better than another one, it may still be the case that merging the two (even using our most naive algorithm) would produce a linearization better than either.

sipa · November 20, 2023, 9:45pm

Right, the childless-descendants in an optimal chunk are always higher feerate than the chunk itself (or more generally, any “bottom” subset (a subset that includes all its descendants) must have higher feerate than the chunk itself - if not, that subset could be removed without breaking topology, and doing so would increase the feerate).

This could be the basis for another general post-processing step (e.g. try all bottom subsets of 1 or 2 transactions, and if they have lower or equal feerate than the overall chunk, split them up). Similarly for top transactions with higher or equal feerate than the overall chunk. Of course, we could also hope to build linearization algorithms that just don’t give rise to such chunkings in the first place.

FWIW, I think that relaying the set of childless-descendants for each chunk to a peer is another hypothetical way of conveying knowledge of linearizations. It could be made Erlay-compatible as it has set semantics, and with that set one can reconstruct the same linearization (and likely gives a decent result even if the clusters don’t exactly match).

Yes, absolutely. I’ve changed the text to say “could” instead of “should”. I think we’ll want to run prefix-intersection merging (or some further iteration of this idea) anytime we have distinct linearizations of the same transactions. It even works when the input linearizations don’t cover the exact same transactions but there is some overlap, though I haven’t run tests for that.

sipa · November 23, 2023, 6:00pm

I’ve now learned that our comparison operation on linearizations (through comparing the diagram) is not a partial order but a preorder. The difference is that distinct elements can be equivalent under a preorder.

sipa · November 23, 2023, 9:59pm

Is it obvious that the following holds?

By moving a subset of transactions, whose combined feerate is S, to the front of a linearization whose first chunk has feerate <= S, while keeping the relative order within the moved and within the non-moved transactions the same, the feerate diagram will be >= the old one (obviously subject to that being topologically valid).

If so, I may be able to prove prefix-intersection merging always results in something >= both original linearizations.

ajtowns · November 24, 2023, 1:25am

I think you need slightly different terminology: given an ordered set of transactions T = t_1, t_2, .., t_n then Chunk(T) splits up that list, eg C_1 = t_1, t_2; C_2 = t_3; C_3 = t_4, .., t_n while maintaining its order (T = \sum Chunk(T)), and gives you a valid diagram (s(C_1) \ge s(C_2) \ge s(C_3)). Then if chunking a given ordering of txs results in a single chunk, ie Chunk(T) = C, then any reordering of T will result in a comparable diagram that is equal to or better than the diagram for T.

(That’s then obviously true because the diagram has the same start/end points, the first diagram for T is just a line, and the chunking inequality ensures the diagrams are concave)

ajtowns · November 24, 2023, 10:21am

Assume you have a function C(L) that takes a linearisation L and splits that into P_L, T_L where P_L is the first chunk, and T_L is the rest of the linearisation. If L is non-empty, P_L is non-empty also; however T_L may be empty.
Assume you can calculate L - X where L and X and the result is just L with the transactions in X removed
We say that A \le B by doing a feerate diagram comparison and noting B is has a higher or equal feerate at all points.

I think prefix-intersection merging, M(L_1, L_2) is equivalent to:

Take P_1, T_1 = C(L_1) and P_2, T_2 = C(L_2)
Reorder the transactions in P_1 to match the order they appear in L_2, call this P^\prime_1, and then calculate R_1, X_1 = C(P^\prime_1). Note that P_1 \le R_1.
Calculate R_2, X_2 in the same way from P_2.
Choose the highest feerate chunk from R_1, R_2:
- If R_1 is the highest, then: M(L_1, L_2) = R_1 + M(X_1 + T_1, L_2-R_1)
- Otherwise: M_{PI}(L_1, L_2) = R_2 + M(L_1-R_2, X_2 + T_2)

For the case where R_1 is highest, then M(L_1, L_2) = M(R_1 + X_1 + T_1, L_2) and L_1 \le R_1 + X_1 + T_1.

The question is whether L_2 \le R_1 + (L_2-R_1), noting that the transactions in R_1 that we’ve removed from L_2 maintain their order, and have an equal or higher feerate than the original best chunk.

Consider each chunk from L_2, call them c_1, c_2, c_3, \dots, c_n. Then define d_i = c_i - R_1, and e_i = c_i - d_i. In that case R_1 = e_1 + e_2 + e_3 + \dots + e_n (because R_1 is the first chunk of P^\prime_1 which had its transactions put in the same order as L_2).

Define:

\begin{align} \gamma_j &= \sum_{i=1}^j c_i \\ \delta_j &= \sum_{i=1}^j d_i \\ \epsilon_j &= \sum_{i=1}^j e_i \\ \zeta_j &= \sum_{i=j+1}^n e_i \\ \end{align}

We see L_2 = \gamma_n, and R_1 + (L_2 - R_1) = \epsilon_n + \delta_n and \epsilon_n = \epsilon_j + \zeta_j. Note that the feerate of \zeta_j is greater or equal to the feerate of R_1 = \epsilon_n in all cases – otherwise those transactions would not have been included in R_1 when calculating the chunk.

Consider total size of \gamma_j, S(\gamma_j) and its total fee F(\gamma_j). Note that the feerate diagram of L_2 is made up of the pairs S(\gamma_j), F(\gamma_j).

If we consider \epsilon_j + \delta_j then we simply have S(\epsilon_j + \delta_j) = S(\gamma_j) and F(\epsilon_j + \delta_j) = F(\gamma_j).

Now consider \epsilon_j + \delta_j + \zeta_j – for which the feerate diagram at position S(\gamma_j) is at least F(\epsilon_j + \delta_j), ie at least F(\gamma_j). But we know \zeta_j has a higher fee rate than any chunk in \epsilon_j + \delta_j, so the feerate diagram of \epsilon_j + \zeta_j + \delta_j is an improvement. That gives \gamma_j \le \epsilon_n + \delta_j, and choosing j=n gives L_2 = \gamma_j \le \epsilon_n + \delta_n = R_1 + (L_2 - R_1). QED?

sipa · November 24, 2023, 3:04pm

(Only digested part of the above so far)

Note that R_1 \leq P_1.

Why? Moving part of P_1 to the front could put a higher feerate new chunk up front.

M_{PI}(L_1,L_2)

What is PI?

FWIW, my (incomplete) proof sketch is as follows:

Assume that the following holds: moving transactions to the front of a linearization such that they form a new chunk, while leaving the internal ordering within moved and non-moved transactions the same, is a non-worsening of the diagram if the new chunk’s feerate is >= the original first chunk’s feerate. (~~no proof~~ EDIT: see proof here)
Also assume that reordering the transaction within one chunk can never worsen the diagram (worst case the chunk set remains the same, best case it splits in multiple parts).
The prefix-intersection merging algorithm can be seen as starting with two linearizations L_1 and L_2, and in each step moves some transactions to the front of both. Each step is a non-worsening of the diagram of both, and in the end, both linearizations are identical. That resulting linearization is thus better or equal than each of the inputs. To see why each step is a non-worsening:
- WLOG, assume the intersection found is a subset of L_1's first chunk, in the order of transactions of L_2. (swap the linearizations if not)
- The change applied to L_1 is just a reordering of its first chunk, which is fine.
- The change applied to L_2 is moving a new chunk to the front, of feerate at least that of its original first chunk (if not, that chunk would have been chosen instead), leaving the order within moved and non-moved transactions the same. This is fine.
- Afterwards, both linearizations start with the same transactions (at least one, so progress is made), so we can continue with just the distinct suffix.

ajtowns · November 24, 2023, 3:22pm

sigh, those are both typos. I mean P_1 \le R_1 [this is your “Also assume”] and I changed M_{PI} to M in editing - “PI” stood for prefix-intersection.

sipa · November 25, 2023, 3:29am

(working my way through further)

Interesting observation, I hadn’t realized this before: any suffix of a chunk’s linearization (which leaves at least one transaction off) must have higher feerate than the chunk itself. If it didn’t, the chunk without the suffix would have higher feerate and thus be the chunk instead.

ajtowns · November 25, 2023, 6:18am

yeah, chunking means that suffix-pays-for-prefix. (could be equal rather than higher, depending on whether chunking is maximal or minimal) [edit: i guess we always want minimal – as if we have two chunks of the same feerate, it’s easy to combine them when building a block, but much harder to split a chunk]

sipa · November 25, 2023, 12:42pm

Indeed. Smaller chunks are better in general, so if possible, you want not chunk equal feerate subsets together.

sipa · November 26, 2023, 3:33am

I don’t think I follow the last paragraph.

This \epsilon_j + \delta_j + \zeta_j diagram, is that considering this expression at different values of j? I’m confused how it can be evaluated at S(\gamma_j), or really how it forms a diagram at all.

Also generally, to prove a diagram is everywhere ≥ than another, you need to show both that all points of the first lie ≥ the second, but also that all points of the second lie ≤ the first.

ajtowns · November 26, 2023, 4:58am

(I’m finding it hard to work out a decent way of comparing diagrams that doesn’t get lost in details really quickly. Perhaps the problem is that I really only want to compare orderings of a given set of transactions after chunking them, not compare two orderings of potentially two different sets of transactions with arbitrary groupings)

Anyway, I guess the general case I’m going for here is

\epsilon_n + \delta_j \ge \gamma_j + \zeta_j

Those are comprised of the exact same txs, ie d_1, .., d_j and e_1, .., e_n, but in different orders. The QED step is setting j=n giving \zeta_j=\emptyset, and thus R_1 + (L_2 - R_1) = \epsilon_n + \delta_n \ge \gamma_n = L_2. \ge is a diagram comparison after chunking the txs, and +, \sum are concatenation.

The initial step for j=0 is easy: \delta_0 = \gamma_0 = \emptyset and \epsilon_n = \zeta_0 by construction.

The inductive step is to show \epsilon_n + \delta_{j+1} \ge \gamma_{j+1} + \zeta_{j+1}. To show that, we have:

\begin{align} \epsilon_n + \delta_{j+1} &= (\epsilon_n + \delta_j) + d_{j+1} \\ &\ge (\gamma_j + \zeta_j) + d_{j+1} \\ &= \gamma_j + (e_{j+1} + \zeta_{j+1}) + d_{j+1} \\ &\ge \gamma_j + e_{j+1} + (d_{j+1}) + \zeta_{j+1} \\ &\ge \gamma_j + (c_{j+1}) + \zeta_{j+1} \\ &= \gamma_{j+1} + \zeta_{j+1} \end{align}

Note that this chain is from best linearisation to worse. Steps are:

inductive hypothesis
moving d_{j+1} after \zeta_{j+1} improves things (observe that \zeta_{j+1} has a higher feerate than any chunk)
reordering the chunk c_{j+1} into e_{j+1} + d_{j+1} doesn’t make things worse

Not sure how to properly formalise those steps yet, but I think they’re okay.

sipa · November 26, 2023, 1:49pm

Ok, I’m with you. I didn’t realize before that + meant concatenation.

So you’re essentially relying on 3 steps which each individually don’t make the diagram worse, and composing them to show that moving a new higher-fee chunk to the front while leaving internal ordering the same, makes the overal thing not worse.

Those 3 steps are:

If a \geq b, then a + c \geq b + c. That seems reasonable, though I do want to think through what happens when c gets chunked together with some suffix of a or b (but not both).
Two chunks can be swapped if the second one has higher feerate than the (highest chunk feerate in) the first. That’s probably true under some conditions, but it’s not entirely clear to me what those conditions are. Does this also hold when both/either isn’t really a chunk (in the sense of being chunked togeter) but consists of multiple?
Reordering and splitting a chunk is fine. If it’s really a chunk, then this is obviously the case - it’s just reordering transactions within a chunk which at worst has no effect on the diagram - but is it obvious here that c_{j+1} is actually a single chunk, when considered in the linearization presented there?

In short, I think we need some theorems that relate diagram quality with transformations/reorderings of the linearization, even when those modify chunk bounaries.

ajtowns · November 26, 2023, 4:22pm

I think all these steps rely really heavily on how they’re setup via the prefix intersection algorithm – eg, c_{j+1} is a chunk in \gamma_j + c_{j+1}, because \gamma_j = c_1 + c_2 + .. + c_j and c_1, c_2, .., c_n is a correct chunking, and then the fact that you generate a chunking by merging tx sets in any order means that if you add stuff on the end of that, you can start off with that chunking, and then you’ll only potentially be merging those chunks from the tail, you’ll never need to split them up.

I’m starting to have a bit of luck formalising this in lean4, fwiw, but it’s slow progress : I can convert a list of txs into a chunking, and compare fee rate graphs (evaluating the diagram at each integer byte/weight with a total fee in \mathbb Q), but currently don’t have any theorems about any of that.

(If you have a better way of defining a diagram than as a function from \mathbb N \to \mathbb Q that’d be great. I started off trying to do line segments from (s_1,f_1) \to (s_2,f_2) but that got super painful super fast)

sipa · November 26, 2023, 4:50pm

I guess what we need is a slightly more general concept of a chunking, which is a sequence of transaction sets, without the requirement that their feerates are monotonically decreasing.

And we can define a diagram for such a general chunking too, it’s just not necessarily ~~convex~~ concave.

Let C\left((s_1,s_2,\ldots,s_n)\right) = (t_1,t_2,\ldots,t_m) be the chunking operation, which just merges adjacent sets whenever the later one is higher feerate than the earlier one.

Then there are some theorems that hold, I think, such as:

C(S) \geq S
A \geq B \implies C(A) \geq C(B)

ajtowns · November 26, 2023, 5:08pm

I think diagrams for chunked sets are concave not convex? (I looked it up after I said convex previously – Concave function - Wikipedia – outside of polygons, this terminology always confuses me)

Playing around at https://github.com/ajtowns/lean4-clump/tree/master/Clump . Copying the text of those files into https://live.lean-lang.org/ seems to work, without needing to install anything.