Binary Fuse filters as an alternative to BIP 158 GCS

Hi Everyone,

While at Chaincode BOSS 2026, I started looking into whether we can do better than the Golomb-Coded Sets (GCS) used in BIP158. Binary Fuse filters didn’t exist when BIP158 was drafted, and I suspected they might offer a better CPU/bandwidth trade-off for light clients.

I built a benchmarking framework using Core’s infrastructure to see if modern set-membership algorithms brings something to the table or it is just theory.

The framework processes ~50,000 mainnet blocks and tests various wallet sizes across x86 and ARM (Raspberry Pi 5). It validates everything against ground truth to ensure I’m not wrong.

You can find the code and full methodology here: Light Client Block Filter Research (Binary Fuse Filters)

Shortly, the 16-bit Binary Fuse variant (Fuse16) is a strong candidate for a drop-in replacement for GCS. To be honest, the CPU win was more than I expected:

  • Client CPU work: ~9×–80× reduction on desktop; ~6×–45× on ARM.
  • Bandwidth: A negligible increase (~0–3%) in the scenarios I tested.

The 16-bit filter is essentially “flat” here. It’s a simple replacement of GCS to Fuse16.

I also have some ideas for hierarchical constructions to shave down the bandwidth further, but that introduces more complexity and potential privacy leaks that need a closer look.

A few questions before I go deeper:

  • Is a massive CPU gain worth a ~2-3% bandwidth hit? Is this tiny bandwidth increase a significant concern for the network, or is the dramatic CPU efficiency more critical?
  • Any “gotchas” with Binary Fuse I might be missing?
  • I’m not a security expert, so if you see any potential threats or security trade-off I didn’t think of, please let me know.
  • Is the benchmarking framework itself worth polishing? Personally, I’m okay with the current “research-grade” code, as it served my purpose. I’d be happy to make it more modular so others can drop in their own algorithms and compare against BIP158 on real mainnet data — if people find that useful.

A personal note: I’ve been working in corporate environment my whole carrier, and this is my first open-source post. Also, this is my first bitcoin-post. Honestly, I’m a bit nervous to put this out there…

Any feedback is appreciated.

Cheers, Csaba

5 Likes

First off, thanks for taking a look at new filter types. Great to see others are interested in this research. A few comments:

The benchmark compares bandwidth but the false positive rates do not match, GCS has a false positive rate of 1/784931 whereas BF with 16-bit fingerprints is 1/65536. To reach parody with the GCS false positive would be 20-bits. As a follow up, it could be interesting to build the index and measure a wallet rescan using varying fingerprint sizes. I might take this on.

One simple “gotcha” for BF is how values are seeded and how the filter is constructed. To mitigate targeted hash collisions we would likely want to use the block hash as a seed/entropy. There is a possibility the algorithm fails in the case of BF, in the case where there is a cycle in the graph. I wonder how often this occurs in practice.

Some general feedback I’ve gotten on using GCS for mobile is that the battery draw can make CBF unappealing for users. The reported numbers on CPU are promising. I can imagine a class of user that is willing to trade off higher bandwidth for faster queries and lower CPU expenditure, perhaps a mobile user on WiFi or high throughput data plan. For those running Bitcoin Core, I can see potential for speeding up wallet rescans with these as well.

3 Likes

Thank you for the review! I appreciate it a lot.

I think the measured false-positive counts do match the expected rates, let me explain why.

I also believe that the measures on 10 different wallet scenarios supports that Fuse16 is providing better bandwidth and CPU load than Fuse20, even though Fuse20’s FP rate is closer to GCS. It seems strange, but I believe my 50k mainnet dataset with 10 wallet scenarios do support this statement.

I agree with you, that F20 is closer to GCS on per query FP rate. But the benchmark is measuring something more practical: filter bandwidth plus block downloads caused by false positives for multiple scripts. This is what really matters because this is the real bandwidth we are using. Fuse20 has better FP rate, but also has a bigger filter size, so eventually Fuse16 is providing better practical results on my 50k blocks mainnet dataset.

I think that the measured false positive numbers do match the excepted rates. Here is why.

Here you can find the actual false positive ratios of all the measured filters:

bitcoin_research_01/light_client_research/results/raspberry_pi5_wallet_benchmark_results_50000.txt at 9f4b6947c94cbf67d57b1136f0b8ea2a497414ac · purszki/bitcoin_research_01 · GitHub

For example, for the simple_user wallet (24 scripts): GCS: matches=1037, FP=3 F16: matches=1053, FP=19 F18: matches=1041, FP=7 F20: matches=1035, FP=1

The ground truth has 1034 real matching blocks. For Fuse filters, 55 blocks are currently skipped because I have not handled the tiny-filter corner cases yet, so the number of negative blocks tested is: 50000 - 55 - 1034 = 48911

What is the probability of a false positive block in case of a Fuse16 filter? It requires some calculations. The probability of a FP query of a single script: 1/65536. The probability that a query is non-false positive: 1-1/65536 = 65535/65536. We have 24 scripts in the simple_wallet test case, so the probability of a non-false positive block query for the wallet: (65535/65536)^24. The probability of a false positive block query: 1-(65535/65536)^24 = 0.00036614. We had 48911 investigated blocks, so the expected value of the false positive hits is 48911*0.00036614=17.9. We measured 19. I think it’s pretty close.

It’s also an interesting question that 50k block mainnet datasets and the example wallets are conclusive enough to tell that Fuse16 is better than Fuse20… I feel the dataset is huge enough, however I’m not sure about the wallet examples. I am not super-certain that the current wallet examples cover enough realistic wallet shapes.

100% I agree, this is one of the open points in my write-up.

My current intuition is that this should be solvable, but I didn’t work on the solution yet.

Are you able to elaborate on what these corner cases are exactly? Was it the case that the construction algorithm failed and would need to be re-seeded?

Fuse filters require more than two elements in a set. For example, if a block contains only one element, an upper layer on top of the fuse filter must handle that case.

This wasn’t critical for the benchmarking, so I just fidnt implement this logic yet. It’s an easy fix.

Regarding re-seeding: as far as I recall, it didn’t occur in the 50k mainnet dataset, but give me some time, I’ll need to verify once I have access to my laptop.

1 Like

First of all, thanks to Mike for bringing this topic to a wider audience on the Bitcoin Optech podcast, also thanks for the guys for reading the documents and for all the feedback.

It seems that people are usually more concerned about the bandwidth usage of light clients than the CPU usage. My understanding is that this was also Rob’s earlier view, and also one of the points raised on the podcast.

Binary Fuse filters are optimized for CPU speed, and I still think they are interesting because they are extremely fast and elegant.

Based on the community feedback, I started to test other possible plug-and-play replacements for GCS. Early results for Ribbon and BuRR look promising.

These are still very preliminary results from a quick prototype over the same 50k-block mainnet dataset. I will publish proper measurements before drawing strong conclusions.

For now, the raw filter bandwidth looks like this:

Filter Bandwidth over 50k blocks
GCS 972.15 MB
Binary Fuse 16 955.89 MB
Ribbon 16 779.75 MB
BuRR 16 771.38 MB

So, for raw filter bandwidth, Ribbon-16 and BuRR-16 are about 20% smaller than GCS.

This does not directly mean 20% lower total bandwidth for a wallet, because total bandwidth also depends on block downloads. For larger wallets, block downloads dominate the total cost. It’s also a question for me, if BIP158 does matter for huge wallets at all.

Here are the false positive rates: Here are the false-positive counts for three example wallet sizes:

Wallet size Scripts GCS Binary Fuse 16 Ribbon 16 BuRR 16
Small 24 3 19 16 22
Medium 120 3 79 81 77
Large 480 9 136 137 147

For huge wallets, the total download bandwidth is still better than with GCS. However, for large wallets, true-positive and false-positive block downloads dominate over the filter size.

Data coming soon.

I’d not bury Binary Fuse filters yet, because they are still very fast. BuRR and Ribbon are not as fast as Fuse filters, but definitely faster than GCS. I didn’t do the correct benchmarks yet.

Thanks again for the feedback. Trying to show detailed results asap.

I’d like to share my thoughts on the problem we are attempting to solve and some statistics I computed. Compact block filter clients are of course block-based sources of information for wallets. That would imply, even if the user is not opening the wallet, the amount of data they have to download is constantly linearly increasing with time. I have recommended to wallet developers that they sync block filters on a regular cadence, say once per week, via a background sync (when the user is on battery and wifi). During this sync, if the user has not opened the wallet, they probably won’t be expecting a payment, but we still have to check each block for a small handful of scripts to be sure. A low-bandwidth filter, with higher false positives, would allow app developers to conclude either: 1. we definitely don’t need to check the block range on next app open 2. we have to download X filters on startup. Minimizing the size of filters makes this check more practical, as the OS is less likely to kill the process for it’s running time and bandwidth use.

Pieter has a great post on how to minimize the size of a GCS filter given a desired false positive rate. I used this to compute the potential index size for the block range of 480,000 to 940,000. Here are the results for a few parameters:

M: 1533, P: 10, size: 5.891254116 GB
M: 6132, P: 12, size: 6.868467422 GB
M: 49058, P: 15, size: 8.334467794 GB
M: 784931, P: 19, size: 10.289143823 GB

After running this analysis I think a new “bandwidth limited” GCS filter could be interesting for users that know they only have one or a few scripts to query. The script I used to compute the index sizes is available here

Hi Rob,

When I started looking into this topic for BOSS-2026, the first idea was exactly this: finding a GCS parameter set that would work over approx 1000 blocks as a prefilter. I think it was based on your notes if I can recall.

At that time, however, I was optimizing for CPU rather than bandwidth, and I couldn’t find a GCS parameter set that would be faster than BIP158. I also hadn’t read Pieter’s post back then.

I can actually dig out the code and re-check, this time for bandwidth, using your findings. Let’s see whether real-life measurements support the theory.

Always the optimist, I recently measured that BuRR and Ribbon filters also look promising for small wallets. (xor filters, binary fuse filters, BuRR and Ribbon filters - they are super similar but using different construction ideas. I hope I haven’t missed any others!). Let me try the following steps:

  1. Check 1000-block GCS prefilter with parameterization based on your notes.
  2. Check the same using BuRR and Ribbon filters.

Is the topic of hierarchical filters discussed somewhere? Could you briefly describe what is the basic idea? Is it Fuse-specific?

A simple idea is to have ‘large’ filters for a range of blocks, e.g. one for every 2016 blocks, instead of one ‘small’ filter per block (as we have currently). A client first downloads the large filters, and if it finds a match, only then it downloads the individual filters from that range.

However, a ‘large’ filter has to include all the elements from the sum of the ‘small’ filters, so the total size is not much smaller. The large filter has roughly the same information as the sum of the small filters.

I did some measurements for the size of the large filter vs. the total size of the small filters (for a range of 2016 simulated blocks), and I got only 0.12% size saving, which is minimal. Probably too small to justify the extra complexity.