It’s about optimizing for the last few transactions in a block; the more chunks clusters are broken up in, the more puzzle pieces there are to use. I don’t think you can - without whole-mempool context - say whether the beginning or the end of a cluster is better to have small pieces, as you don’t know ahead of time where they’ll end up in a block. And I think the splitting itself is more important than the ordering.

Yeah. I don’t think we need to pick a well-defined unique optimal.

Yeah, that’s proving the existence of optimality through \operatorname{merge}. My thinking was to prove existence of optimality before introducing merging (because proving that “move best subset to the front, continue” is optimal seems more intuitive than through merging), but perhaps that’s an unnecessary detour.

An alternative way to prove that, I think, is as follows. Let M be the set of maximal elements of the set of valid linearizations for a given graph (maxima have no elements that compare strictly higher than them, but may have elements incomparable to them).

If M = \{\}: impossible in a finite set (start with any element and keep picking a strictly larger element, eventually you end up in a cycle, which is impossible as it implies elements strictly larger than themselves).

If M = \{m\}, then m is optimal. Again start with any element, and keep picking strictly larger elements. Each of these chains must end with m as it’s the only element with no larger ones. Thus, m > everything else.

Otherwise M consists of two or more elements: merge them, and end up with something better than both. This is in contradiction with them being maximal elements.