Cluster mempool definitions & theory

sipa · May 6, 2024, 6:52pm

Given some of the insights from the LIMO thread, I think there is a simpler way to formalize things. Here is an attempt with just sketches. If people are interested I can expand on it.

The biggest changes are:

A new concept “guide” of a cluster is introduced, which covers both linearizations and set-linearizations.
Guides use a different representation: they’re not lists, but a set of topological subsets, each one being a subset of the next one.
The gathering theorem is replaced with a much simpler variant, and then used to prove a new “composition” theorem, which describes how given a set of topologically valid subsets one can produce a guide that is as good as all of them. Merging and LIMO can be seen as instances/variants of composition.

Graphs (unchanged):

Transactions graphs are defined as before: nodes that have a integer \operatorname{fee} and a strictly positive \operatorname{size}, and directed edges representing dependencies pointing from child to parent.
The functions \operatorname{fee}, \operatorname{size}, and \operatorname{feerate} are defined for sets of nodes as respectively the sum of the fees, the sum of the sizes, and the sum of the fees divided by the sum of the sizes.
The ancestors and descendants of a set are defined as before: \operatorname{anc}_G(S) is the transitive closure of S under the parent relation. Similarly, \operatorname{desc}_G(S) is the transitive closure of S under the child relation.
A subset S of the nodes of a graph G is said to be topological if \operatorname{anc}_G(S) = S.

Guides

A guide of a graph G is defined differently: it is a set of topological subsets of G, where each is either a subset or superset of every other, and which includes \varnothing and G itself as elements.
A guide is said to be full if it contains a subset for each size from 0 to the size of the graph inclusive (i.e., has n+1 elements for a graph with n transactions). If not, a guide is said to be partial. Guides are isomorphic with set-linearizations, and full guides are isomorphic with linearizations.
A guide a is said to be a weakening of guide b if a \subseteq b, and said to be a strengthening of b if a \supseteq b.
Full strengthening theorem: every guide has a full strengthening. For any two c_1 \subset c_2 \in L, topologically sort c_2 \setminus c_1 and for each include a new subset, in order.

Preordering of guides

The diagram \operatorname{diag}_L of a guide L of graph G is the real function defined for all inputs x \in [0,\operatorname{size}(G)]: \operatorname{diag}_L(x) is the maximum value of \operatorname{fee}(a) + (x - \operatorname{size}(a))\operatorname{feerate}(b \setminus a) over all a and b in L such that \operatorname{size}(a) \leq x and \operatorname{size}(b) \geq x. It is the minimal concave function through all the (\operatorname{size}(p), \operatorname{fee}(p))_{p \in L} points.
The chunking \operatorname{chunk}(L) of a guide L is the weakening (subset) of L containing those x for which \operatorname{diag}_L(\operatorname{size}(x)) = \operatorname{fee}(x). It consists of those graph subsets whose size/fee pairs make up the vertices of the convex hull shape that the diagram has, and is the minimal weakening with the same diagram as L.
A preorder \gtrsim is defined between guides of the same graph G as: L_1 \gtrsim L_2 iff for all x \in [0,\operatorname{size}(G)] it holds that \operatorname{diag}_{L_1}(x) \geq \operatorname{diag}_{L_2}(x). As chunking does not change diagrams, L_1 \gtrsim L_2 \iff \operatorname{chunk}(L_1) \gtrsim L_2 \iff L_1 \gtrsim \operatorname{chunk}(L_2).
Strengthening improvement theorem: if L_1 is a strengthening of L_2, then L_1 \gtrsim L_2.

Improving guides

Common subset theorem: given two guides L_1 and L_2 for a graph G, both of which have the same smallest non-empty element s. Let L_1' and L_2' be guides of G \setminus s obtained by removing s from all elements of L_1 resp. L_2. Then L_1' \gtrsim L_2' \implies L_1 \gtrsim L_2.
- Proof: Going from the diagrams of L_1' and L_2' to those of L_1 and L_2 is just shifting the convex hulls by s, and then adding a point (0,0) to both hulls.
Simple gathering theorem: given topological subsets s \subseteq G and c \subseteq G such that \operatorname{feerate}(s) \geq \operatorname{feerate}(c) (or s = \varnothing or c = \varnothing) and \operatorname{feerate}(s) \geq \operatorname{feerate}(s \cap c) (or s \cap c = \varnothing), it holds that \{\varnothing, s, c \cup s, G\} \gtrsim \{\varnothing, c, G\}.
- Proof: this is a simplification to 2 sets of the earlier set gathering theorem, plus covering the case where \operatorname{feerate}(c) < \operatorname{feerate}(G) which is trivial.
Supreme subset theorem: given a set C = \{c_1, c_2, \ldots, c_n\} of c_i that are all topological subsets of a non-empty graph G, there always exists a non-empty topological subset s of G such that \{\varnothing, s, c_i \cup s, G\} \gtrsim \{\varnothing, c_i, G\} for all i = 1 \ldots n.
- Proof: pick s to be the highest feerate non-empty intersection between any combination of c_i's, plus G, and apply the simple gathering theorem (no intersection between s and c_i can have higher feerate than s, because that intersection would have been picked as s instead).
- Note that this is not necessarily the only set which satisfies the \{\varnothing, s, c_i \cup s, G\} \gtrsim \{\varnothing, c_i, G\} conditions, and others may be computationally cheaper to find.
Composition algorithm: Define \operatorname{compose}(G, C) for a graph G and a set of topologically valid subsets C = \{c_1, c_2, \ldots, c_n\} for it as follows:
- If G = \varnothing, return \{\varnothing\}, the only valid guide for G.
- Otherwise, find a non-empty topological s such that \{\varnothing, s, c_i \cup s, G\} \gtrsim \{\varnothing, c_i, G\} for all i, which by the supreme subset theorem always exists. Let C' = \{c_1 \setminus s, c_2 \setminus s, \ldots, c_n \setminus s\}. Let L' = \operatorname{compose}(G \setminus s, C'). Return L = \{\varnothing\} \cup \{\forall x \in L' : x \cup s\}.
Composition theorem: Given C = \{c_1, c_2, \ldots, c_n\} of topological subsets of a graph G, \operatorname{compose}(G, C) \gtrsim \{\varnothing, c_i, G\} for all i = 1 \ldots n. In other words, \operatorname{diag}_{\operatorname{compose}(G, C)}(\operatorname{size}(c_i)) \geq \operatorname{fee}(c_i) for all i.
- Proof: we know by induction that L' \gtrsim \{\varnothing, c_i \setminus s, G \setminus s\} for all i. From the common subset theorem it follows that L \gtrsim \{\varnothing, s, c_i \cup s, G\}. Because of the properties of s we know that \{\varnothing, s, c_i \cup s, G\} \gtrsim \{\varnothing, c_i, G\}. Together it follows that L \gtrsim \{\varnothing, c_i, G\}.
- Merging of two guides L_1 and L_2 is now effectively applying composition to C = L_1 \cup L_2, with a specialized \mathcal{O}(n) complexity s-finding algorithm.
- LIMO can be seen as a variant of composition, with C = L \cup \{S_1, S_2, \ldots\}, but where the recursion doesn’t operate on exactly C', but the S_i get replaced with (hopefully better) subsets every iteration.