It can be checked beforehand: it’s just the feerate of the first chunk of L \setminus T is \leq f.

Hmm, indeed, that would be a better starting point, as it exactly matches the conditions in the actual prefix-intersection merging algorithm. And it’s not equivalent to mine (as your example at the end of your post shows; in that example the highest chunk feerate of bad transactions is 7.5 which exceeds f=5). I’ll think about generalizing it.

Indeed, something like this needs to be included in the proof. Also further, it’s currently unclear why it’s even allowed to move the c_t \cap T transactions to the front of c_t.

I don’t think you need an argument about what happens to c_t after rechunking. It’s sufficient that progress was made by moving a nonzero number of transactions towards the front. If you can keep doing that, all of them will end up at the front of L. And we prove progress will keep being made until they are all at the front of L, so that is indeed the end state.

Note that I started off by merging equal-feerate chunks, so there can be at most one chunk for any given feerate.

Right, this example does capture a situation we want the theorem to cover, but it currently doesn’t.

Sadly, the proof breaks entirely when chunks with feerate > f are permitted. It is possible, by following the “move good transactions to front of last chunk that has any” algorithm, to end up in a situation that’s better than the desired end state (all good transactions up front). The theorem is still likely true, but the proof technique of individual move steps to achieve it does not work, unless it can somehow be amended to never improve beyond the goal.

New attempt, with a fully “geometric” approach to show the new diagram is above the old one, but largely using insights about sets from @ajtowns’s proof sketch above.

Theorem

Given:

A linearization L

A topologically valid subset G of the transactions in L (“G” for “good” transactions).

Let L_G be the linearization of the transactions G, in the order they appear in L.

Let L_B be the linearization of the transactions not in G (“bad”), in the order they appear in L.

Let f be the highest chunk feerate of L.

L_G (considered in isolation) must form a single chunk, with feerate \geq f.

Then:

L_G + L_B is at least as good as L (better or equal feerate diagram, never worse or incomparable).

Let’s call this the gathering theorem (it matches what the VGATHER* AVX2 x86 CPU instructions do to bitsets).

Proof

Let S(x), F(x), R(x) respectively denote the size, fee, and feerate of the set x.

Let (c_1, c_2, \ldots, c_n) be the chunking of L (the c_i are transaction sets).

We know R(c_i) \leq f for i=1 \ldots n, because of input assumptions.

Let \gamma_j = \cup_{i=1}^{j} c_i (union of all chunks up to j).

Let e_i = c_i \cap G for all i=0 \ldots n (the good transactions from each chunk).

Let \zeta_j = \cup_{i=j+1}^{n} e_i (union of all good transactions in chunks after j).

Because \zeta_j is a suffix of L_G, which is a single chunk with feerate \geq f, we know R(\zeta_j) \geq f for all j=0 \ldots n-1.

Let P(x) = (S(x), F(x)) be the point in 2D space corresponding to the size and fee of set x.

Let D be the diagram connecting the points (P(\gamma_j))_{j=0}^{n}, corresponding to the prefixes of chunks of L. This is the diagram of L.

Let N be the diagram connecting the points (0,0) followed by (P(\gamma_j \cup \zeta_j))_{j=0}^n, corresponding to the prefixes of L_G + L_B consisting of all good transactions and the bad transactions up to chunk j.

N is not the diagram of L_G + L_B because that requires rechunking that linearization. However, rechunking can only improve the diagram, so Nis an underestimate for the diagram of L_G + L_B.

D is a concave function. A point lies strictly under a concave function iff it lies under every line segment that makes up D, extended to infinity. Thus, a point lies on or above D iff it lies on or above at least a single line that makes up D.

In the drawing below, the blue diagram formed using the \gamma_i+\zeta_i points (plus initial green \zeta_0 segment) is an underestimate for N, the diagram of L_G+L_B. The red diagram shows D, the diagram for L. Intuitively, the blue diagram lies above the red diagram because the slope of all the green lines is never less than that of any red line:

In what follows, we will show that every point on N lies on or above a line making up D. If that holds, it follows that N in its entirety lies on or above D:

The first diagram segment of N, from (0,0) to P(\gamma_0 \cup \zeta_0), is easy: it corresponds to L_G itself, which has feerate (slope) \geq f. All of these points lie on or above the first line of D, from P(\gamma_0) (= (0,0)) to P(\gamma_1), which has feerate (slope) f.

For all other diagram segments of N, we will compare line segments with those in D with the same j = 0 \ldots n-1. In other words, we want to show that the line segment from P(\gamma_j \cup \zeta_j) to P(\gamma_{j+1} \cup \zeta_{j+1}) lies on or above the line through P(\gamma_j) and P(\gamma_{j+1}).

P(\gamma_j \cup \zeta_j) lies on or above the line through P(\gamma_j) and P(\gamma_{j+1}) iff R(\zeta_j) \geq R(c_{j+1}). This follows from R(\zeta_j) \geq f and R(c_{j+1}) \leq f.

P(\gamma_{j+1} \cup \zeta_{j+1}) lies on or above the line through P(\gamma_j) and P(\gamma_{j+1}) iff R(c_{j+1} \cup \zeta_{j+1}) \geq R(c_{j+1}). This follows from the f relations too, except for j=n-1, but there the endpoints of the two graphs just coincide.

All points between P(\gamma_j \cup \zeta_j) and P(\gamma_{j+1} \cup \zeta_{j+1}) lie on or above the line through P(\gamma_j) and P(\gamma_{j+1}), given that the endpoints of the segment do.

Thus, all points on an underestimate of the diagram of L_G + L_B lie on or above the diagram of L, and we conclude that L_G + L_B is a better or equal linearization than L.

Generalization

It is easy to relax the requirement that L_G forms a single chunk. We can instead require that only the lowest chunk feerate of L_G is \geq f. This would add additional points to N corresponding to the chunks of L_G, but they would all lie above the existing first line segment of N, which cannot break the property that N lies on or above D.

Merging individually post-processed linearizations (thus, ones with connected chunks) may result in a linearization whose chunks are disconnected. This means post-processing the merged result can still be useful.

I think the intuition for this is that you’re constructing a (possibly non-optimal) feerate diagram for L_G+L_B by taking the subsets L_G + \sum_{i=1}^j d_i and showing this is a better diagram than that of L. Because you can’t say anything about d_i (the bad parts of each chunk), you’re instead taking the first j chunks as a whole, which have a feerate of f or lower, and then the suffix of L_G that was missed, which has a feerate of f or high, which kinks the diagram up. Kinking the diagram up means this subset has a better diagram than the chunks up to and including c_j.

FWIW, I think this was the argument I was trying to make in my earlier comment but couldn’t quite get a hold of.

I think the general point is just: L^* \ge L if (and only if?) for each \gamma_i (prefixes matching the optimal chunking of L) there exists a set of txs \zeta_i where \gamma_i \cup \zeta_i is a prefix of L^* and R(\zeta_i) \ge R(\gamma_i).

Ah! Actually, isn’t a better “diagram” to consider the ray from the origin to these P(\gamma_i) points in general, ie, for x=0...c_1 plot the y value as R(\gamma_1), for x=c_1..c_2 plot R(\gamma_2) etc. Then (I think) you still compare two diagrams by seeing if all the points on one are below all the points on the other; but have the result that for an optimal chunking, the line is monotonic decreasing, rather than concave.

Also, with that approach, to figure out the diagram for the optimal chunking, you just do the diagram for every tx chunked individually, it’s just f_{opt}(x) = \max(f(x^\prime), x^\prime \ge x).

I think that’s a question of formality. It’s of course obvious that if all points after each chunk lie above the old diagram, then the line from the origin to the first point does too. Yet, we actually do require the property that that line in its entirety lies above the old diagram too. Depending on how “obvious” something needs to be before you choose not to mention it, it could be considered necessary or not.

I don’t think this works. It is very much possible to construct a (fee) graph N that lies above graph D everywhere, but whose derivatives (feerates) aren’t.

EDIT: I see, you’re talking about average feerate for the entire set up to that point, not the feerate of the chunk/section itself. Hmm. Isn’t that just another way of saying the fee diagram is higher?

No, I mean that you can just choose the origin for your first point and go straight to \gamma_1 \cup \zeta_1 when constructing N, and run the same logic.

Yes, precisely – it is another way of saying the same thing, I think it’s just easier to reason about (horizontal line segments that decrease when you’ve found the optimal solution).

Translating the N is better than L proof becomes: look at the feerate at \gamma_i; now consider the feerate for \gamma_i + \zeta_i – it’s higher, because \zeta_i \ge f and the size of that set is greater, and an optimal chunking will extend that feerate leftwards, so N chunks better up to i, for each i, done. You don’t have to deal with line intersections.

Right, that works, we could just drop \gamma_0 \cup \zeta_0 from N. I don’t think it actually simplifies the proof though, because the reasoning for the first segment (now from (0,0) to P(\gamma_1 \cup \zeta_1)), would still be distinct from that for the other segments.

Consider the following:

3 transactions A (fee 0), B (fee 3), C (fee 1), all the same size.

L_1 = [A,B,C], chunked as [AB=3/2, C=1/1]

L_2 = [B,A,C], chunked as [B=3/1, AC=1/2]

L_2 is a strictly better linearization, the fee-size diagram is higher everywhere.

The feerate (using the R(\gamma_i) method) for the middle byte is 3/2 for L_1, but 4/3 for L_2. Would they be considered incomparable by the feerate-size diagram?

Prefix-intersection merging can be simplified to the following:

Given two linearizations L_1 and L_2

(Optionally) Output any prefix of both that’s identical.

Let P_i be the highest-feerate prefix of L_i, for i \in {1,2}.

Assume without loss of generality that R(P_1) \geq R(P_2) (swap the inputs if not).

Let C be the highest-feerate prefix of L_2 \cap P_1.

Output C, remove it from L_1 and L_2, and start over.

So we only need to find the best prefix of the intersection of the highest-feerate prefix with the other linearization, and not the other direction.

This does not break the “as good as both inputs” proof, and from fuzzing it appears it that this doesn’t worsen the result even (it could be that this results in worse results while still being at least as good as both inputs, but that seems not to be the case).

I don’t have a proof (yet) that the algorithms are equivalent, but given that it’s certainly correct still, it’s probably fine to just use this simpler version?

If you have L1=[5,3,1,8,0] (chunk feerates: 5, 4, 0) and L2=[0,8,1,3,5] (chunk feerates: 4, 3) then bestPi chooses C=L2 \cap [5] to start with, ending up as [5,8,3,1,0]. Whereas calculating and comparing C_1 and C_2 would produce [8,5,3,1,0].

But post-processing probably fixes that for most simple examples at least?

(Even if post-processing didn’t fix it; I think so long as merging produces something at least as good as both, that’s fine; quicker/cheaper is more important)

I think you can make post-processing fail to help for this example just by adding a small, low-fee parent transaction P that is a parent to both the 8-fee and 5-fee txs.

I think if you have five transactions, A,B,C,D,E, of 10kvB each at 50,30,10,80,0 sat/vb, and one transaction, P, at 100vB and 0 sat/vb that each of A,D spends an output of, with the same arrangement as above, then post-processing doesn’t fix it either?

Merging isn’t necessarily commutative either, I think, in the case where both linearisations have different (eg, non-overlapping) first chunks at equal feerates.

Hmm, indeed. This example shows that bestPi isn’t always as good as the original PiMerge description.

Given that, I lean towards sticking with the original, as I don’t think that the performance difference between the two is significant. If there was no observable difference at all, it’d make sense to pick the faster one, but that doesn’t seem to be the case now?

The linearizations that come out can differ depending on whether the transactions from L_1 or L_2 is chosen when the found subsets have the same feerate, but I believe this cannot affect the fee-size diagram. Generally I use “smaller size is better” as tie-breaker when comparing equal-feerate chunks, as inside linearization algorithms this sometimes avoids accidentally merging equal-feerate subsets, but I don’t believe it matters here what tie-breaker is chosen. Do you have a counter-example?

EDIT: here is a counterexample:

Transactions: A=1, B=2, C=3, D=2, E=1 (all same size)

L1: [B,E,C,A,D], chunked as [B:2, EC:2, AD:1.5]

L2: [D,B,A,C,E], chunked as [D:2, B:2, AC:2, E:1]

Merge(L1,L2): [B,C,D,E,A], chunked as [BC:2.5, D:2, E:1, A:1]

Merge(L2,L1): [D,B,C,A,E], chunked as [DBC:2.33, A:1, E:1]

Ok, with that counterexample in place I’m now again leaning the other direction: that the right approach is using the simplest algorithm that works.

My impression is that the non-commutativity and non-associativity of merging are effectively randomly stumbled upon through accidentally having/putting transactions in the right order. And these accidents can’t be prevented, and moreover trying more subsets will inevitably mean there is a small chance of finding something better still. However, that doesn’t mean it’s worth spending time on even if it’s a small cost. That time could be better spent on directly trying more things in the linearization algorithm.