Thus, half of the UTXO set is likely spam-related. Since UTXO spam is permanent and unprunable, it is much worse than spam in the form of auxiliary transaction data.
Thus, we should think about ways to clean the UTXO set of these spam outputs.
Soft-Fork Proposal: Age-Based Expiration
Introduce a rule that makes very small, very old outputs unspendable. For example, let T be a UTXO’s age in years.
Once utxo_amount - T * dust_limit is negative, the UTXO expires and becomes unspendable. Expired UTXOs can then be pruned from the UTXO set.
Hi, author of the linked mempool research report on UTXO set here.
I don’t support this approach being applied retrospectively as it is confiscatory.
That said, useful to know that 41.65% of the UTXO set is dust amounts precisely at the implied default core policy dust limits for various script types.
Introduce a rule that makes very small, very old outputs prunable. However, retain the location of that UTXO—e.g., location = block_height/tx_index/output_index—indicating where in the blockchain the UTXO was created.
Users can still spend their UTXO if they include, in the annex of their spending transaction, an SPV proof (a Merkle inclusion proof of the transaction that created the UTXO, in a block in the chain).
This shifts the burden of storing the data from the global UTXO set to the user.
An expired UTXO’s location can be represented in less than 8 bytes. And since a set of locations has lower entropy, it is likely compressible to less than 8 bytes per expired UTXO.
This reduces spam by a factor of more than 8. In practice, that means of those +5 GB of spam, we could prune more than 4.3 GB. In particular, it would allow us to prune all of the bare multisig spam.
Robin, I appreciate the effort to tackle UTXO bloat it’s a legitimate scalability challenge. But the proposed method of expiring or pruning small, aged UTXOs crosses a line I think we should be very cautious about.
Introducing a rule that makes certain UTXOs unspendable based on arbitrary thresholds amounts to a protocol-level confiscation. That’s not a minor technical tweak it fundamentally alters the trust model of Bitcoin. People have always operated under the assumption that if they control the private key, their coins are safe, regardless of how small or old the UTXO is.
This change would undermine that assumption, possibly harming long-term holders, forgotten wallets, and low-income users disproportionately. It also opens the door to future proposals that may seek to invalidate UTXOs under different pretenses.
Instead, we should double down on solutions that preserve Bitcoin’s principles:
Encourage UTXO consolidation through dynamic fee markets.
Improve wallet UX to discourage spammy outputs.
Explore fee policies that penalize dust creation without invalidating it.
Cleaning the UTXO set is important, but not at the cost of breaking the core promise that Bitcoin makes to its users: your coins are yours, no matter how small or how long you’ve held them.
This approach is reorg-unsafe. You probably want to keep “txid:vout”, because this is what you need to make a valid input anyway. You can strip “scriptPubKey” in that way or another, or replace it with some hash, and require it from the user, but still: you don’t want to make your node cryptographically weaker, because then it could potentially accept an invalid transaction.
Yes, but if you want to go in that direction, then it doesn’t make sense to simplify it only for low-value UTXOs. If you assume, that reorgs cannot be deeper than N blocks, then you can simplify “txid:vout” for all UTXOs with more than N confirmations. And then, when you send transaction data, you can have the current format, where “txid:vout” is used, and some new, compressed format, used between new nodes, which would send everything in more packed way.
The format of locations is block_height/TX_index/output_index.
So if you experience a reorg back to block_height H you simply delete all locations which’s block_height > H from the expired UTXO set, and you add all locations which were spent in blocks that were reorged out.
However, I like your idea that you can replace expired UTXOs with, e.g., a 20-byte hash. That simplifies the scheme significantly, makes it cheaper to spend expired UTXOs, and still prunes about 75% of all spam.
Edit: Oh, now I see your point. There could be a UTXO which expired on the previous fork, but was spent on the reorging fork shortly before it expired. Then you would have already deleted it from your utxo set and can’t simply restore it to verify that valid block.
However, you can restore that deleted UTXO from its ‘location’ as long as you still have the block at block_height, in which that UTXO was created. And if you don’t have that block you can request it from your peers.
Because if it would be reorg-safe, then we could use it for all transactions. Maybe some unconfirmed ones could use “txid:vout” to know, what are you referring to, but everything else could simply refer to indexes in that case.
And I guess, if you apply it to every single transaction, then you can test some edge cases, and see, if things are really resistant to reorgs or not.
There have been a lot of older proposals along the utxo commitment scheme lines, they generally run into the problem that maintaining a commitment over the UTXO set is so expensive that it destroys the savings. Like great you made it 1/4 the size but now requires a factor log(n) more work to update… that’s not a big win. And/Or they run into issues that the bottleneck for any particular user might be storage or it might be communication and making it cost log(n) work more to spend isn’t a win.
But perhaps scoping this to outputs which are unlikely to be spent, even unlikely to be spendable, and making their spending cost just come out of their tx size (so no further resource inflation for nodes)-- actually solves both of those issues.
Obviously anything with-confiscation is a non-flyer. The principle is important to uphold, and the fact that you floated it at all might outright sabotage the potentially useful idea.
I think there would also just be merit in setting a threshold value which no matter the age the output wouldn’t be pruned-- I believe this can probably be set so that all ‘real wallets’ are almost entirely above it while almost all outputs which are very unlikely to get spent (unless bitcoin increases in value a lot) are below it. This would greatly increase the pool of users for which the change has no effect at all and might oppose it because they don’t want any change at all.
If the rule against spending without a proof only applied to outputs in a range of heights, it would make it possible to adjust the thresholds both up and down with softforks. Except for a value of 0 it’s arguable that even 1 satoshi may be valuable enough in the future that there is no sense in doing this because it very likely will get spent. It’s a little less elegant, but the realpoltik of people being nervous about losing access to the coins they buried in the back yard (even though it likely wouldn’t be well founded) is worth keeping in mind. If it can optimize almost as well while leaving more people alone that’s better. And asking people to reason about the far future is just going to result in less agreement, e.g. given enough time some people 1 sat will buy a whole planet or something.
Invocation of annex assumes a particular script type, but a lot of the dust stuff is varrious script types. It’s arguable that this proof data should be “super-prunable” – after all it’s entirely redundant if you have the entire chain. So e.g. it could cost weight like regular data but be in a separate witness that you could skip downloading when processing the entire chain (since if you’re processing the entire chain you could just construct it yourself). At the very least it should be designed so that there are no degrees of freedom in the serialization so that even if it were in transactions it could be stripped and reconstructed to save sync bandwidth.
Does Utreexo have this issue? If done as a soft fork, it reduces the UTXO set storage requirement to about 1kb. It would make UTXO “spam” completely irrelevant. It has other benefits too, e.g. being able to validate the blockchain quickly with very little RAM.
This comes at the expense of making the input (witness?) bigger since the spender needs to prove that a coin exists.
In the very long run, e.g. 1000+ years, having every node store an ever growing unprunable UTXO set doesn’t seem ideal, with or without spam / dust.
It’s my view that it does today, yes. I’ll happily agree with you that in a long enough view the costs work themselves out – log() scaling things in particular because log() is essentially a constant beyond a certain size. But it’s no surprise that the optimal construction may differ at different parts of Bitcoin’s life.
Sync things aside (which have security model impacts)-- I don’t think Bitcoin is currently at a point where the utxo set size is dominating the costs of running a node: Consider how few nodes run node-limited (though I admit that’s a biased sample because of course it excludes anyone who just didn’t run it at all).
In any case, an insight to be skimmed here I think is that limiting commitments to coins that are unlikely to be spent might change the tradeoff surface-- cause for those you get the space reduction without as much overheads.
I agree—that’s why I’m not proposing a UTXO-set commitment.
Instead, the spender provides the usual SPV proof (Merkle inclusion of the transaction in its block), which you simply check against the header chain.
So the blockchain itself acts as a TXO commitment.
The only new data structure is a lightweight list of 8-byte “locations”, representing the unspent expired UTXOs:
Just to make sure we’re all talking about the same thing.
The observation that you get a log() scaling factor was made in the context of early commitment schemes, where the entire UTXO set is still kept, but organized in a Merkle tree so that proofs for it can be provided. This would indeed, within fully-validating nodes, be a significant extra cost, because every update to the UTXO set may now requiring updating \mathcal{O}(\log n) many internal nodes of the tree.
Utreexo is not that. It removes the UTXO set entirely from validation nodes, and instead lets them maintain just the commitment - however the network has to provide Merkle paths. The maintenance of the actual tree is (in theory) distributed over wallets, which maintain the paths for just the UTXOs they themselves care about. More realistically however, as it would be hard to switch over the entire ecosystem to Utreexo, it would involve bridge nodes that can translate proof-less transactions and blocks to proof-carrying one. Bridge nodes effectively maintain proofs for every UTXO, and they are the ones that now gain the \mathcal{O}(\log n) scaling factor (because every UTXO change may involve updating that many proofs). There may still be a gain, because bridge nodes aren’t quite on the critical path for validation, and can more easily be shared, but still: it introduces a critical component in the infrastructure that scales worse than full nodes today.
All this to say, I generally agree there is a tradeoff that’s unclear whether it’s worth making, but it isn’t one just inside validating nodes - I think pure Utreexo validation nodes would generally scale much better than today’s validation nodes, but at the cost of outsourcing an even worse factor elsewhere.
EDIT: I just realized you were probably talking about the \mathcal{O}(\log n) scaling factor for bandwidth? Utreexo has some tricks AFAIK to make it not that bad for block validation (lots of sharing between the paths) and transactions spending recent UTXOs, but fair.
That’s interesting, but I wonder if that is acceptable, why wouldn’t it be acceptable for everything? In a way it’s morally similar to Utreexo, in that it forces a (much weaker, but still some) responsibility onto wallets or bridging infrastructure to come up with proofs, reducing the responsibility validating nodes have. It has the advantage of not having expiring proofs (Utreexo proofs expire when the tree changes enough), but at the same time, it’s also only a minor gain to validation I think - still requiring them to maintain an indexed \mathcal{O}(n)-sized set, and significantly increased bandwidth (SPV proofs are larger than typical transactions).
It sort of falls in between Bram Cohen’s TXO bitfield idea (which is \mathcal{O}(n) in the size of the TXO set (not just unspent), but with an extremely small constant factor of 1 bit per entry), and Cory Field’s UHS idea where validation nodes store hashes of UTXOs, and the full UTXOs being spent are relayed along with their spending.
At the very least because of my point-- any overhead costs (from the bandwidth, yes) don’t apply when the output is not actually spent because its actually unspendable or just uneconomic to spend. So for at least these outputs the tradeoff seems easier.
Aside, I was just ignoring the spentness encoding because it can probably be represented much more compactly. (Though the first scheme I had in mind required having the block handy to decode it, so no good. )
Alas, that is problematic given that a single input can require a megabyte txout proof.
But it’s not fatal, you can just remember with every block header an additional hash for a root for some output tree, and proofs are against that. Wallets can remember their static fragments in that tree. Less reuse of existent datastructures, but at least it can be efficient and not have to present potentially a 1mb witness stripped transaction. One could even trim up the tree a bit by omitting provably unspendable outputs.
I would guess that historical data shows that fewer than 1 in 100 dust outputs are ever spent. So even with the extra Merkle proof needed if spended, the total footprint is still far smaller overall – especially for the most problematic dust outputs like P2MS, where we see orders of magnitude more spend-to than spend-from.
Could miners commit to the set of UTXOs being expired in each block, instead of requiring an SPV proof from spenders?
I believe this would remove the burden on wallets to construct proofs, and allow pruned nodes to validate spends simply by tracking which UTXOs were previously expired. It also opens the door to decoupling pruning from spending fee penalties:
dust UTXOs could be pruned AND subject to a higher spending fee as a deterrent, while older UTXOs that are simply considered unlikely to be spent (such as suspected lost coins) could be pruned without any spending penalty.
There is no need for any kind of “deterrent” as a freestanding goal, particularly in this context where the proposal reduces the cost of utxo bloat to essentially whatever it cost to track the (un)spentness.
From the perspective of the consensus rules it mostly doesn’t matter who constructs the proofs. From a security and autonomy perspective it must be realistic for the owners of the coin to do so on their own.
This proposal has the advantage the the proofs are static, so they’re basically free to construct-- the Bitcoin wallet even used to store exactly the txo proofs this proposal assumes though the behavior was removed because it wasn’t used for anything. (I point out above why sadly I think standard SPV proofs aren’t great but if this were adopted using them sufficiently old wallets even already have the required proofs, how cool is that?)
Of course you could also have other parties providing the proofs, – anyone who wants to, miners might well find it attractive since they collect the fees on the spend. But I think this question is mostly outside the scope of the consensus rules themselves.
Wouldn’t that require everyone to resync the chain, though?
I think simply using txout proofs is fine, given that it is extremely unlikely that someone created a dust UTXO in a 1MB transaction which they want to spend someday after it would expire.
Also, people would have plenty of time to consolidate such dust outputs before the proposal would activate, so I think it is fair to shift the responsibility to those users who want to keep using dust outputs which are a significant burden for the whole system as they are indistinguishable from the spam that makes up half of the UTXO set.
Pretty much every regular user wouldn’t notice the change at all, because even if your UTXO is worth only $5 today, it would remain untouched for 100+ years.