A few weeks ago @RubenSomsen proposed to speed up the IBD phase in Bitcoin Core by
reducing the chainstate operations with the aid of pre-generated hints. Right now, the main optimization
is to skip script verification up to the -assumevalid
block (enabled by default, updated on each release), but all other block validation checks
are still active. In particular, UTXO set lookups and removals can be quite expensive and cause regular coins cache and expensive leveldb disk I/O operations.
The idea is to only ever add coins to the UTXO set if we know that they will still be unspent at a certain block height N. All the other coins we don’t have to add or remove during IBD in the first place, since we already know that they end up being spent on the way. The historical hints data consists of one boolean flag for each transaction output ever created (up to including block N), signalling the answer to the question: “Is this transaction output ending up in the UTXO set at block N?”.
If the answer is yes, we obviously add it (as we would also normally do), but if the answer is no,
we ignore it. Consequently, we don’t have to do any UTXO set lookups and removals during the IBD phase
anymore; the UTXO set only grows, and the order in which blocks are validated doesn’t matter anymore,
i.e. the block validation could be done in parallel. The obvious way to store the hints data is bit-encoded, leading to a rough
size of (number_of_outputs_created / 8)
bytes, being in the hundreds of mega-bytes area for mainnet (e.g. ~348 MiB for block 850900).
As a rough overview, the following checks in ConnectBlock
are done in each mode:
validation step | regular operation | assumevalid | IBD Booster |
---|---|---|---|
Consensus::CheckTxInputs | ![]() |
![]() |
![]() |
BIP68 lock checks | ![]() |
![]() |
![]() |
SigOp limit checks | ![]() |
![]() |
![]() |
input script checks | ![]() |
![]() |
![]() |
update UTXO set | ![]() |
![]() |
grow-only, with MuHash check (see below) |
block reward check | ![]() |
![]() |
![]() |
To give some assurance that the given hints data is correct, the state of the spent coins is tracked in a MuHash instance, where elements can be add and deleted in any order. If we encounter a transaction output where our hints data says “false” (meaning, it doesn’t end up in the final UTXO set at block N and will be spent on the way there), we add its outpoint (txid+vout serialization) to the MuHash. For any spend in a transaction, we then remove its outpoint (again txid+vout serialization) from the MuHash. After block N, we can verify that indeed all the coins we didn’t add to the UTXO have been spent (implying that the given hints data was correct) by verifying that the MuHash represents an empty set at this point (internally that can be checked by comparing that the numerator and denominator have the same value).
I’ve implemented a proof-of-concept of this proposal and called it “IBD Booster”, consisting of two parts:
- A python script ibd-booster-hints-gen, which builds on py-bitcoinkernel and takes a datadir and a utxo set in SQLite format (as created by the utxo_to_sqlite.py script) as input, and outputs the bit-encoded hints file: GitHub - theStack/ibd-booster-hints-gen: Tool to create a binary-encoded hints file for the "IBD-Booster" proposal.
- A Bitcoin Core branch which supports reading in the hints file and use it for the optimization as described above with a
-ibdboosterfile
parameter: GitHub - theStack/bitcoin at ibd_booster_v0
The main logic is implemented in a function called UpdateCoinsIBDBooster
, which is called as a drop-in replacement of UpdateCoins
if
the IBD Booster is active for the current block, see bitcoin/src/validation.cpp at ce4d1aa5ac4e2a08172bcbc7e80f9b5675b20983 · theStack/bitcoin · GitHub vs. bitcoin/src/validation.cpp at ce4d1aa5ac4e2a08172bcbc7e80f9b5675b20983 · theStack/bitcoin · GitHub
A guide for trying out the proof-of-concept implementation is following:
Generate the hints data
-
Run a non-pruned node, sync at least to the desired booster target height (886000 in this example)
-
Dump UTXO set at target height, convert it to SQLite3 database
$ ./build/bin/bitcoin-cli -rpcclienttimeout=0 -named dumptxoutset $PWD/utxo-886000.dat rollback=886000
$ ./contrib/utxo-tools/utxo_to_sqlite.py ./utxo-886000.dat ./utxo-886000.sqlite
- Stop Bitcoin Core
$ ./build/bin/bitcoin-cli stop
- Fetch the IBD Booster branch (contains the booster hints generation script in
./contrib
)
$ git clone -b ibd_booster_v0 https://github.com/theStack/bitcoin
$ cd bitcoin
- Create IBD Booster file containing hints bitmap (first start might take a while as the py-bitcoinkernel dependency has to be built in the background first [1])
$ pushd ./contrib/ibd-booster-hints-gen
$ uv run booster-gen.py -v ~/.bitcoin ../../utxo-886000.sqlite ~/booster-886000.bin
$ popd
Setup a fresh node using IBD Booster
Instructions:
- Build and run the IBD-Booster branch (checked out above already)
$ <build as usual>
$ ./build/bin/bitcoind -ibdboosterfile=~/booster-886000.bin
- Wait and enjoy. If everything goes well, you should see the following debug message after the booster target block:
*** IBD Booster: MuHash check at block height 886000 succeeded. ***
First benchmark results
On a rented large-storage, low-performance VPS I observed a ~2,24x speed up (40h 54min vs. 18h 14min) for running on mainnet with `-reindex-chainstate` (up to block 850900 [2]):$ time ./build/bin/bitcoind -datadir=./datadir -reindex-chainstate -server=0 -stopatheight=850901
real 2454m37.039s
user 9660m20.123s
sys 380m30.070s
$ time ./build/bin/bitcoind -datadir=./datadir -reindex-chainstate -server=0 -stopatheight=850901 -ibdboosterfile=./booster850900.bin
real 1094m31.100s
user 1132m53.000s
sys 46m45.212s
The generated hints file for block 850900 can be downloaded here: https://github.com/theStack/bitcoin/raw/refs/heads/booster_data_850900/booster850900.bin.xz (note that this file has to be unpacked first using the unxz
command)
One major drawback of this proposal is that we can’t create undo data (the rev*.dat files) anymore, as we obviously don’t have the prevout information available in the UTXO set during the IBD booster phase. So the node we end up is unfortunately limited in functionality, as some indexes (e.g. coinstatsindex and blockfilterindex) and RPC calls (e.g. getblock for higher verbosity that shows prevout data) rely on these files for creating/delivering the full data. My hope is that this proof-of-concept still serves as a nice starting point for further research of simliar ideas, or maybe is useful for projects like benchcoin.
Potential improvements:
- investigate further speedups (e.g we don’t really need the full coin cache functionality up to the target block)
- don’t load all hints data into RAM at once
- refine file format to include metadata and magic bytes for easy detection
- research about potential compression of the hints data (probably not worth the complexity imho, but an interesting topic for sure)
- implement parallel block validation
- investigate including the coin heights in the hints data to enable more checks (leads to significantly larger hints data though)
- hints generation tool speedups?
- …
Thanks go to @RubenSomsen, @davidgumberg, @fjahr and @l0rinc (sorry if I forgot anyone) for discussing this idea more intensely a few weeks ago, and to @stickies-v for providing the great py-bitcoinkernel project (I was considering to use rust-bitcoinkernel instead for performance reasons, but my knowledge of Rust is unfortunately still too bad). Suggestions and feedback in any form, or more diverse benchmark results (or suggestions on how to benchmark this best) would be highly appreciated.
Cheers, Sebastian
[1] Note that I only tried to run the script with uv so far, since it just worked ™ flawlessly from the start without any headaches. Using other Python package management tools might also work to build and run this script, but I haven’t tested them.
[2] Block 850900 is way in the past of the current assumevalid block, but I had to stop there as I ran out of space on the benchmark machine.