Dust Expiry: Clean the UTXO set from spam

sipa · May 20, 2025, 1:37pm

Just to make sure we’re all talking about the same thing.

The observation that you get a log() scaling factor was made in the context of early commitment schemes, where the entire UTXO set is still kept, but organized in a Merkle tree so that proofs for it can be provided. This would indeed, within fully-validating nodes, be a significant extra cost, because every update to the UTXO set may now requiring updating \mathcal{O}(\log n) many internal nodes of the tree.

Utreexo is not that. It removes the UTXO set entirely from validation nodes, and instead lets them maintain just the commitment - however the network has to provide Merkle paths. The maintenance of the actual tree is (in theory) distributed over wallets, which maintain the paths for just the UTXOs they themselves care about. More realistically however, as it would be hard to switch over the entire ecosystem to Utreexo, it would involve bridge nodes that can translate proof-less transactions and blocks to proof-carrying one. Bridge nodes effectively maintain proofs for every UTXO, and they are the ones that now gain the \mathcal{O}(\log n) scaling factor (because every UTXO change may involve updating that many proofs). There may still be a gain, because bridge nodes aren’t quite on the critical path for validation, and can more easily be shared, but still: it introduces a critical component in the infrastructure that scales worse than full nodes today.

All this to say, I generally agree there is a tradeoff that’s unclear whether it’s worth making, but it isn’t one just inside validating nodes - I think pure Utreexo validation nodes would generally scale much better than today’s validation nodes, but at the cost of outsourcing an even worse factor elsewhere.

EDIT: I just realized you were probably talking about the \mathcal{O}(\log n) scaling factor for bandwidth? Utreexo has some tricks AFAIK to make it not that bad for block validation (lots of sharing between the paths) and transactions spending recent UTXOs, but fair.

That’s interesting, but I wonder if that is acceptable, why wouldn’t it be acceptable for everything? In a way it’s morally similar to Utreexo, in that it forces a (much weaker, but still some) responsibility onto wallets or bridging infrastructure to come up with proofs, reducing the responsibility validating nodes have. It has the advantage of not having expiring proofs (Utreexo proofs expire when the tree changes enough), but at the same time, it’s also only a minor gain to validation I think - still requiring them to maintain an indexed \mathcal{O}(n)-sized set, and significantly increased bandwidth (SPV proofs are larger than typical transactions).

It sort of falls in between Bram Cohen’s TXO bitfield idea (which is \mathcal{O}(n) in the size of the TXO set (not just unspent), but with an extremely small constant factor of 1 bit per entry), and Cory Field’s UHS idea where validation nodes store hashes of UTXOs, and the full UTXOs being spent are relayed along with their spending.