Regarding block serialization, it is true that Libbitcoin must compose outgoing blocks. However the store and composition are significantly faster than the network, and require very little CPU. Consequently response time is not impacted.
If one doubts this, consider that when downloading we deserialize a block off the wire, hash all txs (twice for segwits) and merkle hash (twice for segwits) perform block checks, and serialize the block to the store, with full indexation. When doing this on 64 threads concurrently (without shani) the store is still outpacing the 2.3Gbps network and completes in under an hour. Deserialization of blocks is not an issue.
Performance at the chain “tip” (we refer to “top”, since the linearizarion of the block tree is a stack) is a reasonable consideration. However this is really about compact blocks. Interestingly previous comments about block serialization work in reverse here.
Core stores whole confirmed blocks, so it must gather up all transactions from its memory pool, validate, and then serialize the block, and update the utxo store. This is an extra cost imposed by its data model, in a time-critical phase.
Libbitcoin does not have to construct or serialize a block, and does not have a utxo store to update. That alone gives it a large advantage.
But also, unconfirmed txs are validated and stored in the same table as block txs. So under compact blocks there is no need to validate or store any known txs. The block is constructed from unconfirmed tx metadata only, retained in memory. Validation of the block from this tx metadata is trivial. So this is likely to be significantly faster than Core, not despite the storage model, but because of it.
Finally, this data model effectively eliminates the “mempool” overflow issues with which Core continues to struggle.
Libbitcoin adds all block input points (actually references to points) to a hashmap while checking for existence on the emplace. Constant time per input. It’s done during block.check, so it’s coincident with download, which is fully concurrent.
I don’t understand why inputs are being fetched from a cache at all. Only one block at a time is being validated, and the block had to have just been parsed. Why not just enumerate the inputs and fire off the script validation jobs?
The block has been parsed, but we need to check whether all inputs are unspent. That cannot happen in parallel, since it depends on the order of transactions included in the block. The inputs of one transaction can spend outputs from a previous transaction included in the block. So, each transaction must lookup and remove each input’s prevout from the utxo set, failing if not found. Then each output must be inserted into the utxo set before the next transaction is checked.
These are two distinct block rules. (1) no forward references, (2) no double spends. Rule 1 is pointless, likely just an implementation artifact. Rule 2 is just a check for duplicate points within a block.
These are context free, they do not require a utxo store or any chain context. Rule 1 requires order, rule 2 does not. Libbitcoin splits these two rules and evaluates them independently. This allows the second to be evaluated using concurrency.
However, concurrency for such a fast algorithm with a bounded set size proved counterproductive - especially since it is part of block.check and is therefore evaluated concurrently across blocks.
Given the need to populate and depopulate a utxo store, these checks become tied to it. However you are referring to lookup of previous outputs, not inputs. I was asking why there is an input fetch.
I think you mean output (previous output of the input) here as well. If the prevout doesn’t exist then the input script will fail to validate. That’s just a query for the output during block.connect, which requires no order and is performed concurrently across blocks.
In other words, there are other checks: (1) no forward reference, (2) no [internal] double spend, (3) prevout exists and executes successfully. There is also (4) prevout is in the current or previously-confirmed block, (5) prevout is not spent in any previously-confirmed block [not double spent in current block is #2).
I don’t see how this requires no order though. If the transaction which contains the prevout has not been stored yet then this check will fail even if it does exist.
It only requires that the previous output has been downloaded. That’s a partial ordering requirement. That’s why #3 cannot be performed in block.check.
The important observation is that it does not require a total ordering by block. That constraint is only imposed by the need to populate a utxo store.
I should clarify that rules 4 and 5 (as described above) do require a total ordering by block. It’s just that validating scripts (input->output) does not, that’s a partial ordering (3). And the forward reference (1) and internal double spend (2) checks impose no ordering requirement.
The trick to obtaining maximal concurrency is factoring necessary constraints such that no unnecessary order is imposed.
Existence and spentness are all about block order, we call that “confirmability” and perform these checks in a “confirmation” phase. Checks such as script execution, and the related prevout queries, run concurrently across blocks in the “validation” phase. Anything that is block-context free (header context is always available) is performed in the “download” phase. The three phases run concurrently.
Yes, and I believe any check that requires total or partial order is skipped if below the milestone block. This is what allows a sync in an hour. So, I think it’s still accurate to say that not looking up input prevouts is where the main speedup is.
Looking at a flamegraph of Bitcoin Core IBD, the majority of time is spent looking up input prevouts. That’s why parallelizing that provides a nice speedup. Skipping it entirely though is not possible since we would not have a utxo set at the end, which is not a constraint that libbitcoin has.
I think a weaker equivalent of the partial-ordering idea can be implemented in an “explicit UTXO set” model, though only prior to the assumevalid point (or anywhere it’s known that script validation won’t happen).
If an input is encountered which doesn’t exist in the UTXO set, instead add a “negative UTXO” entry, reflecting the spend, but not the creation. When the UTXO is later added, they cancel out. If negative UTXOs remain at the end, the chain is invalid (but it would require proper care to assign these failures to the spending, not the creation). This would pretty much allow processing blocks in any order.
When script validation is enabled, this isn’t possible because the state required to run script validation is pretty much the entire spending transaction, not just a small piece of UTXO data.
The previous discussion is about full validation. Libbitcoin validates blocks (prevout lookup an script execution) concurrently, which is a significant performance advantage. However, in milestone runs there is no need to look up prevouts.
You might believe that it’s the lack of this lookup that saves the time, but it’s not. It’s the lack of a utxo store imposing a total order on block commitment. This allows us to download, check, store and index blocks concurrently across the entire chain.
I’m not sure I follow this explanation, but the conclusion is correct. The approach wouldn’t work under a candidate header branch above the milestone (assume valid block) because blocks above the block being validated are updating prevout existence and spentness. So existence in the utxo store no longer represents order. Under milestone, validity of the branch is assumed so this doesn’t matter, the objective is just to populate the utxo store up to the milestone.
A material cost of this approach (under milestone) is that the utxo store must not only be thread safe, but locks must persist across each output being searched, compared and written. Given that there are billions of outputs this is a non-trivial consideration.
Ultimately this can be resolved by just downloading the full utxo set under assumption as well, which is already implemented.
I’ve been trying to find a good way to show this with data. I think that our construction of the ElecrumX address index table along with our point compression is a pretty good proxy (both are optional). These are not ordered but in combination impose the same number and type of table reads/writes as a utxo store population would under assume valid.
The address index requires the mapping of the sha256 hash of every output to the surrogate key of the output. This is comparable to the write of the output to the utxo store (the surrogate key allows direct recovery of the corresponding output, with navigation to its spending input).
We don’t do deletes, but the spend of the output can be approximated by the search of each input in the point hash table, and write of an entry to that table in the case that it’s missing. This optional search compresses out redundant point hashes, saving about 50GB (making the fully-indexed store materially smaller than the Core store).
Point compression is neutral on total sync time, because the resulting table is so much smaller (similar to the reduction caused by removing spent outputs in a utxo table). The address index adds about 30 minutes to a 1 hour sync. This is on a machine that takes Core 15 hours to sync under assume valid.
I think this makes the case as closely as possible that lack of utxo store population (reads/writes) is not what powers milestone sync, it’s the lack of imposition of total block ordering by a utxo store.
However, one might also consider the extraordinary amount of non-optional indexing. BN always indexes all transactions by hash, all confirmed txs to their containing block, all outputs to their spending input(s), all inputs to their point hash (optionally compressible), all headers by hash, and all headers to their associated txs (if associated). Each of these is a hash table.
I think you can do the same optimisation even in that case, if you augment the spending block with its corresponding revNNN.dat data. You can’t authenticate the revNNN.dat, so still need to track negative entries, and in addition also need to check that the rev data was correct (compare it to the utxo when you find a match), and need to be able handle falling back to in-order processing (in case you’re given bad rev data which says a tx is invalid, you need to recheck when you get the actual utxo data, if it’s different).
(I was thinking of that technique in the context of taking an assumeutxo set and validating the chain both forwards and backwards simultaneously, to avoid the need to maintaining two utxo sets, provided you didn’t care about optimising for the case where the assumeutxo point was actually invalid)