Linearization post-processing (O(n^2) fancy chunking)

sipa · November 17, 2023, 8:28pm

When implemented as a post-processing to linearization, the following also holds:

It results in decent sub-chunk linearization (for use at the very end of a block when a full chunk no longer fits, though some amount of search, or skipping clusters to find ones with smaller chunks, is likely better).
When re-running post-processing on an already-post-processed linearization, it runs in \mathcal{O}(n) time (no swaps).

So if there are reasons to keep cluster in linearization format (rather than chunked format), I think it may be reasonable to actually do this post-processing at chunking time, but then also keep the result. This means chunkings always get the quality benefit, but sufficiently trivial re-chunkings stay linear in time.