Hi sipa, thanks for your great work on this.
I’ve been thinking about this problem for some months now, alternating between looking for an algorithm and a reduction from some NP hard problem. It really is not obvious. However, DeepSeek R1 finally helped me find the surprising answer:
Finding a highest-feerate topologically-valid subset is possible in O(nm \log (n^2/m)) time, and this has been shown in 1989 by Gallo, Grigoriadis and Tarjan (“A FAST PARAMETRIC MAXIMUM FLOW ALGORITHM AND APPLICATIONS”, SIAM J. COMPUT. Vol. 18, No. 1, pp. 30-55, February 1989, you can find it on sci-hub). Actually, there have been even earlier algorithms for this, quoted in this article, though they are a little slower. For your reference, the problem is called maximum-ratio closure problem (p. 48), the only difference being the direction of the arrows in the graph.
It requires a rather involved algorithm, where the graph is modified into a flow network whose capacities depend on a parameter \lambda (standing for the target feerate) and a min-cut is calculated for several \lambda until the optimum is found. Gallo, Grigoriadis and Tarjan solve this in the same asymptotic time that a single min-cut would take by modifying the Goldberg-Tarjan push-relabel algorithm to keep on working on the next \lambda after finding the min-cut for the one before, and proving that under some conditions that hold here, there are only O(n) breakpoints at which we need to calculate the min-cut.
This seems great news to me. It should mean that we can accomodate much larger clusters. The question is, how do we implement this? I haven’t found an open source implementation of exactly this algorithm, but the repo at https://github.com/jonas-sauer/MonotoneParametricMinCut has several algorithms in C++ that solve very similar problems, are supposed to be even faster in practice, and might be modified for our purposes (MIT license). The alternative would be to write the push-relabel parameterized min-cut algorithm from scratch, but as I said, this is not trivial.