As mentioned [1], I think the worst case proof size now is actually ~1 MB. For example, imagine we have the following block (P2P block serialization):
Bytes
Description
80
Header
1
Tx count
999,826
Coinbase tx
93
1-in, 1-out P2TR (stripped size)
A lite client performing whitepaper-style SPV can be tricked into accepting a fake transaction if the coinbase transaction was actually 64 bytes. To avoid that, it needs to learn the contents of the entire coinbase transaction in order to derive its txid for verifying its depth in the merkle tree. That means, even with optimizations, a proof size of about 1 MB.
Obviously, very large coinbase transactions will be rare given that they reduce miners’ ability include fee-paying transactions, but I think it’s worth noting in discussion and documentation that the worst case is ~1 MB. It should still be possible to validate a worst-case merkle proof with coinbase in witness data (given other soft fork changes), but it would be ~2,000x more expensive than validating a merkle proof that didn’t require a copy of the entire coinbase transaction.
@evoskuil mentioned an alternative potential soft fork: a commitment to tree depth, which could be done for any depth possible with the current consensus rules[1] using only 4 bits. He didn’t suggest where the commitment could be stored, but I think it’s clear that we’re probably never going to use all BIP8/9 versionbits and miners currently seem satisfied with the 16 BIP320 version bits, meaning we could probably put the commitment it the block header version. That wouldn’t require any extra bandwidth for SPV.
I think the two cons of that approach are:
We might want to give more (eventually all of the available) versionbits to miners in the future as hashrate increases; that way individual hardware doesn’t need to create coinbase extranonces or use hacks like nTime rolling.
It still makes SPV more complicated than described in the whitepaper, although only slightly more so.
[1] 4 bits can express a maximum depth of 16 and a tree of depth 16 can have up to 65,536 transactions. However, the minimum possible transaction size is 60 bytes and a 999,919-byte block (excluding header and tx count) can only fit a maximum of 16,665 transactions of that size.
I don’t think that’s strictly true? sha256 includes the size of its input when calculating the hash, so implicit in being provided a midstate is being provided the number of bytes that went into the midstate, in bitcoin that’s s (the midstate), buf (the remaining bytes that will build the next chunk) and bytes (the number of bytes that went into both those things). So a preimage for the txid of a very large coinbase tx can still be reduced to as few as 104 bytes. That requires you have some way of differentiating a “valid” 64 byte coinbase tx from an internal node of the tx merkle tree, but it’s easy to transmit the full coinbase tx in that case anyway.
Compared to the status quo, it improves the correctness of block inclusion proofs. Compared to including the coinbase in the proof, I think the reduction in complexity is more of an advantage tbh:
you need a merkle path, and also to check the witness-stripped tx isn’t 64 bytes
you need a combined merkle path to both the coinbase and the tx; you need to check they’re the same depth; you need to check the coinbase is valid; in order to minimise bandwidth you need to deal with sha256 midstates
you need to lookup the block’s merkle tree depth via (TBD), (except if the block height is prior to the activation height, in which case you’ll need to use this fixed table and hope there’s no absurdly large reorg) then check the merkle path is exactly that long
Perhaps worth noting that these merkle proofs will only give you the witness-stripped tx anyway – in order to verify the tx’s witness data you need the full coinbase in order to obtain the segwit commitment (which can be essentially anywhere in the coinbase tx), and then you need the matching merkle path from that to your non-stripped tx. So for a lite node, checking the stripped size of the tx is straight-forward – that’s all the data it’s likely to be able to verify anyway.
It occurs to me that most of the time, you’re probably more interested in whether an output is (was) in the utxo set, rather than whether a tx is in a block. But I think the concerns here are irrelevant for things like utreexo that have some new merkle tree: they can use a prefix to distinguish tree contents from internal nodes.
It took me a while to figure out how knowing the number of bytes used to
create a digest was an alternative to validating the structure of a
coinbase transaction. Do I accurately understand that you propose the
following:
Prover provides the coinbase transaction midstate (including size)
for all but the final chunk of the first of the double SHA256 hashes,
plus the final chunk.
Prover provides a partial merkle branch for the coinbase transaction.
Verifier uses the midstate (including size) and the final chunk to
generate the coinbase txid.
Verifier checks all provided nodes in the partial merkle branch are
on the right side (i.e., that the branch is for the first transaction in
a block, i.e. the coinbase transaction).
Verifier inserts its generated coinbase txid into the partial merkle
branch and verifies up the tree until it connects to the merkle root in
the block header.
After the above process, the verifier can be certain of the following:
The number of bytes used to generate the digest (txid).
If the number of bytes is not equal to 64, then the digested data
must have been a transaction (or the block was invalid).
If the number of bytes is equal to 64, then the digested data may
have been either a transaction or concatenated digests for an
internal merkle node. The verifier can distinguish between these by
checking whether the 64-byte chunk provided to it matches a coinbase
transaction pattern.
The verified transaction is a coinbase transaction given the inability
to use CVE-2017-12842 to forge coinbase transactions and the position
of the transaction in the merkle tree (or the block was invalid).
The depth of the coinbase transaction, which is the same depth as all
other transactions in the tree.
I agree, as the average absolute bandwidth is trivial, can be easily reduced by another 36 bytes, and can be amortized over an entire block of transactions.
And as the above implies, worst case should not be a consideration given that it is reasonably bounded.
Given that bandwidth is not a material issue, adding sha256 midstates is unnecessary complexity.
Also, by simply compacting out the excess 36 bytes as mentioned above there is no need to validate the coinbase. Reconstituting it with the presumed null point is sufficient (e.g. defaulting a tx deserialization with no input point to a null input point).
Consequently the complexity comparison becomes:
Ensure that the stripped transaction is not 64 bytes, and ensure that its Merkle path is valid.
vs:
Ensure that the (reconstituted) coinbase Merkle path is valid, and ensure the same valid Merkle depth for any tx in the block.
Validating the coinbase Merkle path is of course code reuse, so the new logic is checking the Merkle path size. This is the tradeoff in the SPV wallet. The tradeoff in the full node is mandatory checking of the stripped size of all transactions (and soft fork) vs. doing nothing (possibly “compressing” null points for SPV clients).
Not important to your point but executing this attack using the coinbase transaction requires brute-forcing 40 more bits in the first stage, which makes it fairly impractical.
To add to this list:
This is a roundabout way of closing the fake inclusion attack vector instead of addressing the root issue that 64 bytes transaction are indistinguishable from inner nodes.
This requires miners to update their software stack, whereas making 64 bytes transactions invalid doesn’t require them to do anything (if they aren’t running a non-standard Bitcoin Core).
Even then for an application which checks the SPV proof of transactions for received payments this would never be an issue as they would only check transactions paying to a normal scriptpubkey in the first place, which would implicitly ensure the transaction is never 64 bytes.
After re-reading through the arguments presented in this thread i’m leaning toward including the invalidity of 64 bytes transactions as part of the consensus cleanup proposal. I appreciate Eric’s push back against some of the advertised benefits (my caching assumptions being incorrect and the bandwidth reduction in absolute terms being modest). Still i think the actual benefits, mostly in terms of reduced complexity for SPV applications, outweigh the cost.
It recently occurred to me this would require mandating the coinbase input to have its nSequence field be SEQUENCE_FINAL, as otherwise the coinbase transaction would fail the IsFinalTx() check (the locktime value is the height at which a transaction is valid in the next block).
I just figured the off-by-one would be unnecessarily confusing to anyone trying to use this. But i don’t have a strong opinion, and we might favor the small confusion over restricting the coinbase’s nSequence.
The height could also be encoded in the coinbase’s nVersion, and if we think making the block height available in a simpler manner than parsing the scriptSig isn’t a goal then i think just requiring the coinbase transaction nVersion be anything but 1 would be sufficient.