Great Consensus Cleanup Revival

ajtowns · September 4, 2024, 3:16am

I don’t think that’s strictly true? sha256 includes the size of its input when calculating the hash, so implicit in being provided a midstate is being provided the number of bytes that went into the midstate, in bitcoin that’s s (the midstate), buf (the remaining bytes that will build the next chunk) and bytes (the number of bytes that went into both those things). So a preimage for the txid of a very large coinbase tx can still be reduced to as few as 104 bytes. That requires you have some way of differentiating a “valid” 64 byte coinbase tx from an internal node of the tx merkle tree, but it’s easy to transmit the full coinbase tx in that case anyway.

Compared to the status quo, it improves the correctness of block inclusion proofs. Compared to including the coinbase in the proof, I think the reduction in complexity is more of an advantage tbh:

you need a merkle path, and also to check the witness-stripped tx isn’t 64 bytes
you need a combined merkle path to both the coinbase and the tx; you need to check they’re the same depth; you need to check the coinbase is valid; in order to minimise bandwidth you need to deal with sha256 midstates
you need to lookup the block’s merkle tree depth via (TBD), (except if the block height is prior to the activation height, in which case you’ll need to use this fixed table and hope there’s no absurdly large reorg) then check the merkle path is exactly that long

Perhaps worth noting that these merkle proofs will only give you the witness-stripped tx anyway – in order to verify the tx’s witness data you need the full coinbase in order to obtain the segwit commitment (which can be essentially anywhere in the coinbase tx), and then you need the matching merkle path from that to your non-stripped tx. So for a lite node, checking the stripped size of the tx is straight-forward – that’s all the data it’s likely to be able to verify anyway.

It occurs to me that most of the time, you’re probably more interested in whether an output is (was) in the utxo set, rather than whether a tx is in a block. But I think the concerns here are irrelevant for things like utreexo that have some new merkle tree: they can use a prefix to distinguish tree contents from internal nodes.