Can block validation benefit from CUDA?

ss01x · November 28, 2023, 3:17am

One of the biggest bottlenecks for running a full node is validating each transaction in a block. If I’m not mistaken, the current Bitcoin Core implementation uses all available threads on the processor.

I wonder if this process could benefit from introducing parallelism via CUDA computing (or another way of making use of the GPU).

ZmnSCPxj · November 29, 2023, 9:43am

My naive understanding is largely no.

At least part of the validation involves looking up unspent transaction outputs in a database. This is cached in memory (IIRC) but I think random access to general memory would not fit most GPUs, which are generally targeted more for vector computations (i.e. some kind of uniform memory access).

Many CPUs have SHA256 hardware implementations and I imagine using that would be faster than pushing bytes into the GPU and then pushing the hashes back. Not to mention that the most consensus-important SHA256-use would be SIGHASH calculation, which would involve branching (i.e. the differences between SIGHASH_ALL, SIGHASH_NONE, and SIGHASH_SINGLE, as well as the absence/presence of SIGHASH_ANYONECANPAY changes the sequence of bytes that are fed into the SHA256 machine).

Maybe the best bet would be to focus on the “pure math” SECP256K1 signature validations (i.e. have the SIGHASHes come from the CPU and then push them into the GPU for validation). However, again, GPUs like consistent single-instruction-multiple-data vectorized calculations. And I believe SECP256K1 Schnorr signatures as used in Taproot can be batch validated, meaning you only do a single calculation to validate multiple transactions that use the keyspend path. Since only a single calculation is needed, there seems to be no advantage to using a GPU in that case.

Note: I have not read recent bitcoin core code, so my knowledge may be out-of-date.

real-or-random · March 7, 2024, 10:55am

This may indeed work, and it has been discussed before:

github.com/bitcoin-core/secp256k1

OpenCL support

opened 02:25AM - 25 Feb 23 UTC

cryptoquick

I've been looking for versions of secp256k1 that can run on GPUs. I've found thi…s project, which seems like a fork: https://github.com/ilaychen/ECDSA-OpenCL Maybe it would make sense to upstream some of these changes, to better support GPU acceleration? Am willing to donate 7900 XTX to Sipa. I saw this comment: https://github.com/bitcoin-core/secp256k1/issues/502#issuecomment-357767815 I'm not sure what that meant other than it's possible, and GPUs certainly have improved since that paper was written 10 years ago. It might make sense to revisit this.

But I’m not aware that anyone is seriously looking into it, at least not currently.

Yes, but batch verification does not mean that the work of verifying n Schnorr signatures is the same (or similar) to the work of verifying a single Schnorr signature. Batch verification of n signatures will yield nice speed-ups as compared to verifying the n signatures individually, but it’s at most a factor of ~2 in practice.

CubicEarth · March 12, 2024, 12:08am

I wonder if there could be an expanded set of data that accompanied each block that could make processing faster. The extra data would be preprocessed in some way. It would be large in size, perhaps 10% or even 150% of the size of the blockchain, but it would be much faster to verify. For users who had very fast and unlimited internet, they could prefer to download this extra data so that syncing could happen faster.

vostrnad · March 12, 2024, 4:08am

You should take a look at utreexo, which works exactly how you’re describing. Its main goal is different, reducing the size of the stored chainstate (to ~1 kB), but a direct consequence of that is fewer (~none) disk operations so it could also speed up block validation.

real-or-random · April 15, 2024, 1:12pm

Something to watch out for is the ZPrice competition:

They’re currently running a competition seeking for implementation improvements of ZK proofs and related tech, also using GPUs. See for example GitHub - td-kwj-zp2023/webgpu-msm-bls12-377: WebGPU MSM Implementation (BLS12-377 Curve) for ZPrize 2023 for a submission to the competition that uses WebGPU to speed up multi-scalar multiplication (also known as multi-exponentiation), which is the most important building block in batch verification of ZK proofs, but also in batch verification of signatures. Though they work on a different elliptic curve, their work shows what’s possible: WebGPU Performance - Google Sheets

If I read this correctly, that’s a multi-scalar multiplication of size 2^20 in 1277 milliseconds…

The winners of the competition should be announced soon.

edit: Okay, maybe this example is not all that impressive. That’s 1.2 usec / scalar, and libsecp256k1 can do 4.3 usec / scalar on my laptop CPU (on size 2^15, I haven’t tried 2^20). Anyway, it will be interesting to see the results of the competition…