Merging incomparable linearizations

ajtowns · November 24, 2023, 10:21am

Assume you have a function C(L) that takes a linearisation L and splits that into P_L, T_L where P_L is the first chunk, and T_L is the rest of the linearisation. If L is non-empty, P_L is non-empty also; however T_L may be empty.
Assume you can calculate L - X where L and X and the result is just L with the transactions in X removed
We say that A \le B by doing a feerate diagram comparison and noting B is has a higher or equal feerate at all points.

I think prefix-intersection merging, M(L_1, L_2) is equivalent to:

Take P_1, T_1 = C(L_1) and P_2, T_2 = C(L_2)
Reorder the transactions in P_1 to match the order they appear in L_2, call this P^\prime_1, and then calculate R_1, X_1 = C(P^\prime_1). Note that P_1 \le R_1.
Calculate R_2, X_2 in the same way from P_2.
Choose the highest feerate chunk from R_1, R_2:
- If R_1 is the highest, then: M(L_1, L_2) = R_1 + M(X_1 + T_1, L_2-R_1)
- Otherwise: M_{PI}(L_1, L_2) = R_2 + M(L_1-R_2, X_2 + T_2)

For the case where R_1 is highest, then M(L_1, L_2) = M(R_1 + X_1 + T_1, L_2) and L_1 \le R_1 + X_1 + T_1.

The question is whether L_2 \le R_1 + (L_2-R_1), noting that the transactions in R_1 that we’ve removed from L_2 maintain their order, and have an equal or higher feerate than the original best chunk.

Consider each chunk from L_2, call them c_1, c_2, c_3, \dots, c_n. Then define d_i = c_i - R_1, and e_i = c_i - d_i. In that case R_1 = e_1 + e_2 + e_3 + \dots + e_n (because R_1 is the first chunk of P^\prime_1 which had its transactions put in the same order as L_2).

Define:

\begin{align} \gamma_j &= \sum_{i=1}^j c_i \\ \delta_j &= \sum_{i=1}^j d_i \\ \epsilon_j &= \sum_{i=1}^j e_i \\ \zeta_j &= \sum_{i=j+1}^n e_i \\ \end{align}

We see L_2 = \gamma_n, and R_1 + (L_2 - R_1) = \epsilon_n + \delta_n and \epsilon_n = \epsilon_j + \zeta_j. Note that the feerate of \zeta_j is greater or equal to the feerate of R_1 = \epsilon_n in all cases – otherwise those transactions would not have been included in R_1 when calculating the chunk.

Consider total size of \gamma_j, S(\gamma_j) and its total fee F(\gamma_j). Note that the feerate diagram of L_2 is made up of the pairs S(\gamma_j), F(\gamma_j).

If we consider \epsilon_j + \delta_j then we simply have S(\epsilon_j + \delta_j) = S(\gamma_j) and F(\epsilon_j + \delta_j) = F(\gamma_j).

Now consider \epsilon_j + \delta_j + \zeta_j – for which the feerate diagram at position S(\gamma_j) is at least F(\epsilon_j + \delta_j), ie at least F(\gamma_j). But we know \zeta_j has a higher fee rate than any chunk in \epsilon_j + \delta_j, so the feerate diagram of \epsilon_j + \zeta_j + \delta_j is an improvement. That gives \gamma_j \le \epsilon_n + \delta_j, and choosing j=n gives L_2 = \gamma_j \le \epsilon_n + \delta_n = R_1 + (L_2 - R_1). QED?