64 bit arithmetic soft fork

I think it would be a (potentially big) waste of chainspace to move from minimally encoded numbers to fixed-length 64 bit numbers

I don’t think this is the case. First off - literals (OP_0,OP_1,OP_2…) can just be re-interpreted based on sig version. That means the space they consume will remain at 1 byte, however when they are pushed onto the stack they will no longer be 1 byte - rather 8 bytes. This increases memory consumption, not disk space.

EDIT: Found a bug in my results of historical scanning of the blockchain, will come back and update this section after my bug is fixed and my results are accurate.

We can speculate what future blockchain usage patterns will look like, but lets be honest that is just speculation.

but they’re there for a reason!

Yes! They are. But not for the reason you mentioned.

Lets examine the ill fated BIP62. BIP62 was intended to solve transaction malleability, not optimizing for disk space.

Wrt to script numbers:

Zero-padded number pushes Any time a script opcode consumes a stack value that is interpreted as a number, it must be encoded in its shortest possible form. ‘Negative zero’ is not allowed. See reference: Numbers.

This is meant to solve malleability that was occurring on the network that would cause bitcoin businesses to lose track of withdrawals from their business (IIUC - MtGox suffered from this, again why i’m suspicious of my results).

If we examine the commit that introduced the fRequireMinimal flag in the interpreter, it specifically cites BIP62 rule #3 and #4 as for the reasoning for introducing it. It does not cite disk space usage.

As a by product of this proposed change, I actually believe we could potentially simplify interpreter.cpp for future soft forks by removing the need for SCRIPT_VERIFY_MINIMALDATA - how great is that?! Admittedly, this proposal does not cover encoding for push operations. I’m going to research expanding this proposal to cover these to see what effects it would have.

I’m also suspicious that the only reason we have CScriptNum is because it wrapped the openSSL number implementation. I’ve tried to track this down on github history to verify, but i’m unable to. Perhaps an OG can comment on this to see if this is correct. Here is as far as I got in github archaeology

The only thing worse than dealing with one weird encoding is having to implement another one alongside it.

Yes, so why are you advocating for keeping the 2nd (arguably 3rd, if you count CompactSize ints) format? I’m advocating for consolidating to a single format. As it exists today, we have two number encodings in the bitcoin protocol. The traditional int64_t one (which satoshi values are encoded in, and is of interest for a lot of BIP proposals) and our exotic one.

It is my understanding we were given the 2nd format by satoshi (I haven’t got to the bottom of this myself to confirm this) and have had to modify it multiple times to remove malleability.

CScriptNum parsing will probably always be a part of wallet software, if not just to validate legacy scripts.

I think ‘validate’ is a bit of a strong word as I’m not aware of a lot of external implementations of interpreter.cpp (although there are a few! bitcoin-s has one), but they definitely need to be able to build Scripts for their wallet software to receive/withdraw funds.

I don’t think this proposal would affect wallets that do standard signing logic (p2pkh, p2wpkh, p2trkp). I believe if your Script does not use OP_PUSHDATAx this holds true for p2sh,p2wsh, p2trsp, although I would like others to think about this make sure my mental model is correct. OP_PUSHDATAx is a relatively infrequent opcode, so I suspect that there isn’t wide usage of these (although I haven’t looked too much into the NFT world, where it might make sense to use OP_PUSHDATAx)

My view is that we should move away from exotic things like this in the bitcoin protocol in favor of standard encodings in the rest of the software ecosystem. While there will be an adjustment period, in 10 years people would look back on these changes say ‘remember when you had a numbering system in bitcoin’s programming language that was different than other protocol number representations? What a nightmare! Glad someone decided to consolidate that.’

Lets relegate this numbering system to a fun historical fact best talked about over a :beer: to establish your OG cred :slightly_smiling_face:.

The interpreter needs to support all the historical scripts, so I doubt very much it will be possible removing such code parts. At most this flag will be consensus-enforced.

Yes, which is why I said future soft forks. A better way to state this would be:

Future Script programmers would no longer need to consider SCRIPT_VERIFY_MINIMAL_DATA when writing their Scripts. “Legacy” Script programmers will always need to consider them for policy reasons - unless for some reason policy changes (and I don’t see why it would).

Perhaps that was worded a bit clunky, apologies.

I’m still exploring what the scope of this proposal could include while trying to adhere to my rule about soft forks :-). @jamesob got me thinking about the 2 paths forward via a DM.

I’m trying to decide where to draw the line between requiring all existing op codes to take 8 byte inputs (in a backwards compatible way via a SigVersion), or just adding these arithmetic op codes and allowing people to use the conversion op codes to ‘cast’ stack tops to the appropriate input size (OP_LE64TOSCRIPTNUM, OP_SCRIPTNUMTOLE64, OP_LE32TOLE64) for op codes that pre-date this soft fork proposal.

This proposal currently does the latter, would like to hear others input on this to see if the juice is worth the squeeze with requiring all inputs to be 8 bytes to existing op codes (i.e. OP_CLTV, OP_CSV, OP_WITHIN, OP_DEPTH…)

This comment also is a bit confusing as of course legacy Scripts will not need to be rewritten (p2sh, p2wsh, p2trsp).

If you want to upgrade to use this new proposed soft fork that require 8 byte inputs for operations such as OP_CLTV, this would require Script programmers to upgrade their Scripts to use the soft fork.

If we don’t require 8 byte inputs for anything besides the new numeric op codes (OP_ADD64, OP_SUB64, …) the upgrade story is pretty easy, but we retain the 2nd encoding.

I liked the ideas in that article. Dealing only with non-negative numbers might simplify some things.

Also removing “normalization” from LSHIFT and RSHIFT is the right thing - Elements implementations remove leading zeroes, and coupled with sign bit that is not actually considered after the shift, it makes modelling these opcodes with Z3 needlessly complex, and also normalization can introduce malleability: 5 LSHIFT can take 0x01, 0x0100, 0x010000 with the same result.

Relying on __uint128_t support in GCC and clang is a bit iffy… Would that mean that other compilers will be somehow discouraged ?

I think there’s still need for a way to easily convert to fixed-with numbers. Maybe a generic opcode that would take number of bytes, and zero-pads or truncates if necessary ? Pobably truncation should succeed only if removed bytes are all zero, and script should fail otherwise

For example, 4 FIXNUM would make LE32 from the argument, 8 FIXNUM will make LE64, 32 FIXNUM will make a 256-bit number

I think it would be a (potentially big) waste of chainspace…

Ok finally have some results from this.

The proposal to shift from minimal encodings to 8 byte encodings for ScriptNums would result in roughly 1,176,417,034 (~1GB) larger blockchain. My current local blockchain size is

du -sh ~/.bitcoin/blocks
576G    /home/chris/.bitcoin/blocks

This means this proposal would increase the blockchain size by is 0.17% since the genesis block.

I can share my json file used to calculate these, its large (13GB). Here is an example of a few txs, along with a link to my source code for calculating these json values

{"txIdBE":"3ef4fe026fbea7160f6bc55288db14df2889f088f5a5654d21acaff3e1bb7197","scriptConstants":["04ec091a","18","05","1d"],"sizeIncrease":25,"comment":"ASM"},
{"txIdBE":"91d7d2c1cae734481e621f33c226f23bad257ea28f47cfb45f076f44c732887e","scriptConstants":["04ec091a","17","05"],"sizeIncrease":18,"comment":"ASM"},
{"txIdBE":"cfc7b512ec29507ea5f576ff6185f580047f72e3edfef30f783a6d3fec236d03","scriptConstants":["456c6967697573","74657374","b630"],"sizeIncrease":11,"comment":"ASM"},
{"txIdBE":"fc61cbe788267b631448d001c98bdb45ac70bf8bfd0a1df2b634e4255fcd6664","scriptConstants":["456c6967697573","74657374","d6fd00"],"sizeIncrease":10,"comment":"ASM"},
{"txIdBE":"e3853d6faa369b20c31369526af6c6c2d25686260805185d2b5afa8dbd98602a","scriptConstants":["456c6967697573","fe","bea800"],"sizeIncrease":13,"comment":"ASM"},

I’m working on testnet3 results, i’ll edit those in when I’m done with them.

I think this is only true for the 0-16 numbers. The moment you want to push > 16 onto the stack, you have to add the full 8-byte representation to your script.

This leads me towards thinking we should keep variable length encoding the default, as we could move to the new format without incurring extra cost.

That being said, having worked with RISC-V emulation in Bitcoin Script lately (see Eltrace), I see a real need for the ability to get the 32/64 bit LE representation of a number during script execution. That we could easily add independently of the underlying number format (OP_SCRIPTNUM[TO/FROM]LE64), and perhaps give us all what we need without introducing full arithmetic support for another format.

I think this is an important point. The reason we want 64-bit arithmetics in the first place is to enforce values on the next transaction, and the interesting values are indeed (AFAIK) encoded using fixed-length LE (notable exception is number of inputs/outputs).

How to handle this comes down to the introspection opcodes themselves, as they can be made to put ScriptNum on the stack if that’s what we want.

Thanks for this writeup, Rusty! I agree moving to unsigned-only values could simplify a lot of things, especially if the format is backwards compatible with the existing ScriptNum representation for positive numbers.

By bumping the leaf version you could also re-use all existing opcodes (no OP_ADDV etc)?

@Chris_Stewart_5

Continuing my question from BIP.

The BIP states: “If the operation results in an overflow, push false onto the stack”.

I would propose altering this to so that the result and the overflow amount are pushed onto the stack.

Case 1, no overflow:

1, 1, OP_ADD64 → 2, 0

where 2 is the result and 0 is the overflow amount.

Case 2, overflow:

2^64 - 1, 5, OP_ADD64 → 4, 1

where 4 is the result (mod 2^64), and 1 is the overflow amount. You can think of these two stack values as representing the 128 bit number 2^64+4 broken into two 64 bit chunks.

Push values on the stack in this fashion makes it esimple use these 64 opcodes to do math on numbers larger than 64 bits by chunking them into 64-bit stack elements. The overflow amount tells you how much to carry into the next chunk.

You’d still get the benefit of having a flag to check, if overflow amount is not 0, overflow occurred.

I think the BIP as written lets you add numbers larger than 64-bits using a similar chunking approach, but it is less straight forward and requires IF statements and substructions operations.

The overflow amount would be LE64, but then the zero in case of ‘no overflow’ would also need to be LE64 (otherwise more-than-64bit calculations would still require branching), and that would complicate the common case of checking for ‘no overflow’, because you cannot just do NOT VERIFY - NOT expects a scriptnum

If more-than-64bit arithmetic is desired, then I would say rustyrussell’s idea of using only-positive variable-length integers where larger lengths are allowed is more elegant, and coupled with FIXNUM (or maybe TOFIXNUM/FROMFIXNUM) opcode to convert between variable and fixed encodings, and BYTEREV to convert between low/big endian, that would cover a large range of practical needs

If more-than-64bit arithmetic is desired, then I would say rustyrussell’s idea of using only-positive variable-length integers where larger lengths are allowed is more elegant,

I agree variable length would be more elegant. My worry was about fee pricing since, depending on the math operations allowed, really big nums require more computation than small numbers. You can put big numbers on the stack with single byte opcodes like HASH256. 64-bit chucks provides a very nice way to capture that increase in cost since need more opcodes to perform operations on bigger numbers.

The more I think about my fee cost argument here, the more convinced I am that it doesn’t matter and I am wrong. It seems perfectly fine to have operations on very large numbers in Bitcoin because if one wanted to maximize computational resources spent and minimize fee code that are much better opcodes than arithmetic.

The computational cost of performing arithmetic on even very large numbers, say 520 byte numbers, is probably much smaller than many single opcode instructions like HASH160 or CHECKSIG. I don’t have actual performance numbers on this though.

I think there’s a few choices here:

  • what happens with “overflows” ?
    • they’re not possible, everything happens modulo 2^n
    • if the inputs are “too large”, the script aborts
    • you get an overflow indicator in every result
  • what is 2^n or what constitutes “too large” ?
    • BTC’s max supply is about 2^{51} satoshis, so 64bit is a fine minimum
    • 2^{256} would let you manipulate scalars for secp256k1 which might be useful
    • 2^{4160} would let you do numbers of to 520 bytes which matches current stack entry limits
  • unsigned only, or signed
    • if we’re doing modular arithmetic, unsigned seems easier
    • signed maths probably makes some contracts a bunch easier though
  • what serialization format?
    • if signed, use a sign-bit or 2’s complement?
    • fixed length or variable length – if fixed length, that constrains the precision we could use
    • if variable length, does every integer have a unique serialization, or can you have “0000” and “0” and “-0” all as different representations of 0?

I guess 64-bit unsigned modular arithmetic would be easiest (uint64_t already works), but 4160-bit signed numbers (or some similarly large value that benchmarks fast enough) with abort-on-overflow behaviour and stored/serialized like CScriptNum format might be more appropriate?

Abort-on-overflow seems kind of appealing as far as writing contracts goes: we’ve already had an “overflow allows printing money” bug in bitcoin proper, so immediately aborting if that happens in script seems like a wise protection. If people want to do modular arithmetic, they could perhaps manually calculate ((x % k) * (y % k)) % k provided they pick k \le \sqrt{2^{n}} or similar.

I thought that this is not an option, since it introduces malleability that MINIMALDATA was added to eliminate

If your goal is to allow varying size big nums so that you can do maths in the secp scalar field, you don’t want to enforce minimaldata – otherwise if you take the hash of a blockheader to generate a scalar, and naturally end up with leading zeroes, you have to strip those zeroes before you can do any maths on that hash. Stripping an arbitrary number of zeroes is then also awkward with minimaldata rules, though not impossible.

(I think a minimaldata variant that only applies to values pulled directly from the witness stack, but not from program pushes or the results of operations) would be interesting)

if there is FROMFIXNUM opcode that takes the size of fixed-size integer as one argument and byte blob of that size as another argument and turns it into a variable-size integer it will be easy (and also BYTEREV to deal with endianness)

1 Like

If anyone is interested to see how this set of proposals could be used by other op codes please see my new post about OP_INOUT_AMOUNT

Here is a direct link to a few test scripts in the python test framework

Nice!

How would the IN_OUT opcode change if one decided to go with 64-bit ScriptNum type encoding instead of the LE variant?

I guess not a large change since you would just pushed a ScriptNum encoded integer on the stack instead?

  CScriptNum bn(fundingAmount);
  stack.push_back(bn.getvch());

Another metric to look at is who’s paying for those extra bytes. So for a set of typical transactions (in existence now and expected in the future), how much do those increase?

I’m going to code this up to confirm ergonomics - so mistakes are likely in this post. Call them out if you see them. Here is my understanding without actually writing the code yet

If we were to continue with CScriptNum, as my OP_INOUT_AMOUNT implementation works currently

  1. Read int64_t representing satoshis from BaseTransactionSignatureChecker.GetTransactionData()
  2. Convert the int64_t into a minimally encoded CScriptNum. I don’t think this necessarily has to be done by an op code, could be done in the impl of OP_INOUT_AMOUNT itself
  3. Call CScriptNum constructor, modifying the nMaxNumSize parameter to support 8 bytes.
  4. Push CScriptNum onto stack
  5. Wherever the satoshi value on the stack top is consumed by another op code, we need to figure out how to allow for nMaxNumSize to be 8 bytes.

As an example for step 5, lets assume we are using THE OLD (pre-64bit) numeric op codes

You see we interpret the stack top as CScriptNum, however that CScriptNum has a nMaxNumSize=4 rather than 8 bytes. This leads to an overflow exception being thrown by CScriptNum. This same problem applies to any opcode (another example is OP_WITHIN) that uses CScriptNum to interpret the stack top.

Hi everyone, we have a bitcoin core pr review club tomorrow (3/20/2024) at 17:00 UTC for this implementation.

Happy to answer any questions about the PR, and talk through the ‘Design Questions’ section that I think still have open questions. Hope to see some of yall tomorrow at 17:00 UTC.