I think the intuition for this is that you’re constructing a (possibly non-optimal) feerate diagram for L_G+L_B by taking the subsets L_G + \sum_{i=1}^j d_i and showing this is a better diagram than that of L. Because you can’t say anything about d_i (the bad parts of each chunk), you’re instead taking the first j chunks as a whole, which have a feerate of f or lower, and then the suffix of L_G that was missed, which has a feerate of f or high, which kinks the diagram up. Kinking the diagram up means this subset has a better diagram than the chunks up to and including c_j.
FWIW, I think this was the argument I was trying to make in my earlier comment but couldn’t quite get a hold of.
I think the general point is just: L^* \ge L if (and only if?) for each \gamma_i (prefixes matching the optimal chunking of L) there exists a set of txs \zeta_i where \gamma_i \cup \zeta_i is a prefix of L^* and R(\zeta_i) \ge R(\gamma_i).
Ah! Actually, isn’t a better “diagram” to consider the ray from the origin to these P(\gamma_i) points in general, ie, for x=0...c_1 plot the y value as R(\gamma_1), for x=c_1..c_2 plot R(\gamma_2) etc. Then (I think) you still compare two diagrams by seeing if all the points on one are below all the points on the other; but have the result that for an optimal chunking, the line is monotonic decreasing, rather than concave.
Also, with that approach, to figure out the diagram for the optimal chunking, you just do the diagram for every tx chunked individually, it’s just f_{opt}(x) = \max(f(x^\prime), x^\prime \ge x).