64 bit arithmetic soft fork

Chris_Stewart_5 · January 13, 2024, 2:51pm

Dovetailing this, if a new developer comes to the project and wants to do some simple numerical computations in Script they currently have to learn about policy (SCRIPT_VERIFY_MINIMALDATA). How great would it be to be able to remove policies (reduce complexity, imo) with this change.

Perhaps this could also include removing the SCRIPT_VERIFY_MINIMALIF as well.

‘Removing’ is probably a strong word, but no longer having to take these policy flags into account for future soft forks.

Chris_Stewart_5 · January 13, 2024, 2:59pm

I’ve added another post to talk about altering interpretation of op codes based on leaf version. My suggestion would be to alter the interpretation of OP_0,OP_1 etc to push 8 byte values onto the stack (instead of minimal encodings) if a specific leaf version is found.

This would maintain our invariant that numeric values are always 8 bytes when they are pushed onto the stack, but not necessarily 8 bytes (the exception is our literals, OP_0,OP_1..) when in the script.

dgpv · January 13, 2024, 3:00pm

MINIMALIF is already consensus-enforced for tapscript

Chris_Stewart_5 · January 13, 2024, 3:03pm

Could this check be modified to take into account leaf versions or would that be a hard fork? I’m still learning about what consensus rules are possible to modify with tapscript leaf versions

github.com

bitcoin/bitcoin/blob/3ba8de1b704d590fa4e1975620bd21d830d11666/src/script/interpreter.cpp#L613


      
          case OP_NOTIF:
          {
              // <expression> if [statements] [else [statements]] endif
              bool fValue = false;
              if (fExec)
              {
                  if (stack.size() < 1)
                      return set_error(serror, SCRIPT_ERR_UNBALANCED_CONDITIONAL);
                  valtype& vch = stacktop(-1);
                  // Tapscript requires minimal IF/NOTIF inputs as a consensus rule.
                  if (sigversion == SigVersion::TAPSCRIPT) {
                      // The input argument to the OP_IF and OP_NOTIF opcodes must be either
                      // exactly 0 (the empty vector) or exactly 1 (the one-byte vector with value 1).
                      if (vch.size() > 1 || (vch.size() == 1 && vch[0] != 1)) {
                          return set_error(serror, SCRIPT_ERR_TAPSCRIPT_MINIMALIF);
                      }
                  }
                  // Under witness v0 rules it is only a policy rule, enabled through SCRIPT_VERIFY_MINIMALIF.
                  if (sigversion == SigVersion::WITNESS_V0 && (flags & SCRIPT_VERIFY_MINIMALIF)) {
                      if (vch.size() > 1)
                          return set_error(serror, SCRIPT_ERR_MINIMALIF);

ajtowns · January 15, 2024, 4:22am

I don’t understand this comment – CScriptNum is little endian in the first place, isn’t it? Both my understanding of the code and definition suggests it is, and the wiki seems to agree, at least.

The differences between CScriptNum and this proposal aiui are fixed vs variable length encoding, and using twos-complement vs a sign bit to handle negative numbers.

jamesob · January 16, 2024, 5:43pm

I think it would be a (potentially big) waste of chainspace to move from minimally encoded numbers to fixed-length 64 bit numbers. Minimal encodings may be somewhat cumbersome to deal with, but they’re there for a reason!
The only thing worse than dealing with one weird encoding is having to implement another one alongside it. CScriptNum parsing will probably always be a part of wallet software, if not just to validate legacy scripts.

Davidson · January 17, 2024, 10:31pm

I wonder how unfeasible is it to bring back the 256 bits arithmetic. Now that we have taproot, we can have some fancy crypto inside script, but that wouldn’t go onchain most of the time due to the happy-path aspect of taproot.

Imagine if we could make some convoluted sigma protocol onchain, but it’s only there in case of disputes. We can make those cost some extra sigops to discourage abusing that.

Chris_Stewart_5 · January 19, 2024, 7:59pm

I think it would be a (potentially big) waste of chainspace to move from minimally encoded numbers to fixed-length 64 bit numbers

I don’t think this is the case. First off - literals (OP_0,OP_1,OP_2…) can just be re-interpreted based on sig version. That means the space they consume will remain at 1 byte, however when they are pushed onto the stack they will no longer be 1 byte - rather 8 bytes. This increases memory consumption, not disk space.

EDIT: Found a bug in my results of historical scanning of the blockchain, will come back and update this section after my bug is fixed and my results are accurate.

We can speculate what future blockchain usage patterns will look like, but lets be honest that is just speculation.

but they’re there for a reason!

Yes! They are. But not for the reason you mentioned.

Lets examine the ill fated BIP62. BIP62 was intended to solve transaction malleability, not optimizing for disk space.

Wrt to script numbers:

Zero-padded number pushes Any time a script opcode consumes a stack value that is interpreted as a number, it must be encoded in its shortest possible form. ‘Negative zero’ is not allowed. See reference: Numbers.

This is meant to solve malleability that was occurring on the network that would cause bitcoin businesses to lose track of withdrawals from their business (IIUC - MtGox suffered from this, again why i’m suspicious of my results).

If we examine the commit that introduced the fRequireMinimal flag in the interpreter, it specifically cites BIP62 rule #3 and #4 as for the reasoning for introducing it. It does not cite disk space usage.

As a by product of this proposed change, I actually believe we could potentially simplify interpreter.cpp for future soft forks by removing the need for SCRIPT_VERIFY_MINIMALDATA - how great is that?! Admittedly, this proposal does not cover encoding for push operations. I’m going to research expanding this proposal to cover these to see what effects it would have.

I’m also suspicious that the only reason we have CScriptNum is because it wrapped the openSSL number implementation. I’ve tried to track this down on github history to verify, but i’m unable to. Perhaps an OG can comment on this to see if this is correct. Here is as far as I got in github archaeology

The only thing worse than dealing with one weird encoding is having to implement another one alongside it.

Yes, so why are you advocating for keeping the 2nd (arguably 3rd, if you count CompactSize ints) format? I’m advocating for consolidating to a single format. As it exists today, we have two number encodings in the bitcoin protocol. The traditional int64_t one (which satoshi values are encoded in, and is of interest for a lot of BIP proposals) and our exotic one.

It is my understanding we were given the 2nd format by satoshi (I haven’t got to the bottom of this myself to confirm this) and have had to modify it multiple times to remove malleability.

CScriptNum parsing will probably always be a part of wallet software, if not just to validate legacy scripts.

I think ‘validate’ is a bit of a strong word as I’m not aware of a lot of external implementations of interpreter.cpp (although there are a few! bitcoin-s has one), but they definitely need to be able to build Scripts for their wallet software to receive/withdraw funds.

I don’t think this proposal would affect wallets that do standard signing logic (p2pkh, p2wpkh, p2trkp). I believe if your Script does not use OP_PUSHDATAx this holds true for p2sh,p2wsh, p2trsp, although I would like others to think about this make sure my mental model is correct. OP_PUSHDATAx is a relatively infrequent opcode, so I suspect that there isn’t wide usage of these (although I haven’t looked too much into the NFT world, where it might make sense to use OP_PUSHDATAx)

My view is that we should move away from exotic things like this in the bitcoin protocol in favor of standard encodings in the rest of the software ecosystem. While there will be an adjustment period, in 10 years people would look back on these changes say ‘remember when you had a numbering system in bitcoin’s programming language that was different than other protocol number representations? What a nightmare! Glad someone decided to consolidate that.’

Lets relegate this numbering system to a fun historical fact best talked about over a to establish your OG cred .

dgpv · January 20, 2024, 5:01am

The interpreter needs to support all the historical scripts, so I doubt very much it will be possible removing such code parts. At most this flag will be consensus-enforced.

Chris_Stewart_5 · January 20, 2024, 12:12pm

Yes, which is why I said future soft forks. A better way to state this would be:

Future Script programmers would no longer need to consider SCRIPT_VERIFY_MINIMAL_DATA when writing their Scripts. “Legacy” Script programmers will always need to consider them for policy reasons - unless for some reason policy changes (and I don’t see why it would).

Perhaps that was worded a bit clunky, apologies.

Chris_Stewart_5 · January 20, 2024, 1:02pm

I’m still exploring what the scope of this proposal could include while trying to adhere to my rule about soft forks :-). @jamesob got me thinking about the 2 paths forward via a DM.

I’m trying to decide where to draw the line between requiring all existing op codes to take 8 byte inputs (in a backwards compatible way via a SigVersion), or just adding these arithmetic op codes and allowing people to use the conversion op codes to ‘cast’ stack tops to the appropriate input size (OP_LE64TOSCRIPTNUM, OP_SCRIPTNUMTOLE64, OP_LE32TOLE64) for op codes that pre-date this soft fork proposal.

This proposal currently does the latter, would like to hear others input on this to see if the juice is worth the squeeze with requiring all inputs to be 8 bytes to existing op codes (i.e. OP_CLTV, OP_CSV, OP_WITHIN, OP_DEPTH…)

This comment also is a bit confusing as of course legacy Scripts will not need to be rewritten (p2sh, p2wsh, p2trsp).

If you want to upgrade to use this new proposed soft fork that require 8 byte inputs for operations such as OP_CLTV, this would require Script programmers to upgrade their Scripts to use the soft fork.

If we don’t require 8 byte inputs for anything besides the new numeric op codes (OP_ADD64, OP_SUB64, …) the upgrade story is pretty easy, but we retain the 2nd encoding.

dgpv · January 20, 2024, 2:34pm

I liked the ideas in that article. Dealing only with non-negative numbers might simplify some things.

Also removing “normalization” from LSHIFT and RSHIFT is the right thing - Elements implementations remove leading zeroes, and coupled with sign bit that is not actually considered after the shift, it makes modelling these opcodes with Z3 needlessly complex, and also normalization can introduce malleability: 5 LSHIFT can take 0x01, 0x0100, 0x010000 with the same result.

Relying on __uint128_t support in GCC and clang is a bit iffy… Would that mean that other compilers will be somehow discouraged ?

I think there’s still need for a way to easily convert to fixed-with numbers. Maybe a generic opcode that would take number of bytes, and zero-pads or truncates if necessary ? Pobably truncation should succeed only if removed bytes are all zero, and script should fail otherwise

For example, 4 FIXNUM would make LE32 from the argument, 8 FIXNUM will make LE64, 32 FIXNUM will make a 256-bit number

Chris_Stewart_5 · January 23, 2024, 4:23pm

I think it would be a (potentially big) waste of chainspace…

Ok finally have some results from this.

The proposal to shift from minimal encodings to 8 byte encodings for ScriptNums would result in roughly 1,176,417,034 (~1GB) larger blockchain. My current local blockchain size is

du -sh ~/.bitcoin/blocks
576G    /home/chris/.bitcoin/blocks

This means this proposal would increase the blockchain size by is 0.17% since the genesis block.

I can share my json file used to calculate these, its large (13GB). Here is an example of a few txs, along with a link to my source code for calculating these json values

{"txIdBE":"3ef4fe026fbea7160f6bc55288db14df2889f088f5a5654d21acaff3e1bb7197","scriptConstants":["04ec091a","18","05","1d"],"sizeIncrease":25,"comment":"ASM"},
{"txIdBE":"91d7d2c1cae734481e621f33c226f23bad257ea28f47cfb45f076f44c732887e","scriptConstants":["04ec091a","17","05"],"sizeIncrease":18,"comment":"ASM"},
{"txIdBE":"cfc7b512ec29507ea5f576ff6185f580047f72e3edfef30f783a6d3fec236d03","scriptConstants":["456c6967697573","74657374","b630"],"sizeIncrease":11,"comment":"ASM"},
{"txIdBE":"fc61cbe788267b631448d001c98bdb45ac70bf8bfd0a1df2b634e4255fcd6664","scriptConstants":["456c6967697573","74657374","d6fd00"],"sizeIncrease":10,"comment":"ASM"},
{"txIdBE":"e3853d6faa369b20c31369526af6c6c2d25686260805185d2b5afa8dbd98602a","scriptConstants":["456c6967697573","fe","bea800"],"sizeIncrease":13,"comment":"ASM"},

I’m working on testnet3 results, i’ll edit those in when I’m done with them.

halseth · January 23, 2024, 8:36pm

I think this is only true for the 0-16 numbers. The moment you want to push > 16 onto the stack, you have to add the full 8-byte representation to your script.

This leads me towards thinking we should keep variable length encoding the default, as we could move to the new format without incurring extra cost.

That being said, having worked with RISC-V emulation in Bitcoin Script lately (see Eltrace), I see a real need for the ability to get the 32/64 bit LE representation of a number during script execution. That we could easily add independently of the underlying number format (OP_SCRIPTNUM[TO/FROM]LE64), and perhaps give us all what we need without introducing full arithmetic support for another format.

I think this is an important point. The reason we want 64-bit arithmetics in the first place is to enforce values on the next transaction, and the interesting values are indeed (AFAIK) encoded using fixed-length LE (notable exception is number of inputs/outputs).

How to handle this comes down to the introspection opcodes themselves, as they can be made to put ScriptNum on the stack if that’s what we want.

Thanks for this writeup, Rusty! I agree moving to unsigned-only values could simplify a lot of things, especially if the format is backwards compatible with the existing ScriptNum representation for positive numbers.

By bumping the leaf version you could also re-use all existing opcodes (no OP_ADDV etc)?

EthanHeilman · February 1, 2024, 10:21pm

@Chris_Stewart_5

Continuing my question from BIP.

The BIP states: “If the operation results in an overflow, push false onto the stack”.

I would propose altering this to so that the result and the overflow amount are pushed onto the stack.

Case 1, no overflow:

1, 1, OP_ADD64 → 2, 0

where 2 is the result and 0 is the overflow amount.

Case 2, overflow:

2^64 - 1, 5, OP_ADD64 → 4, 1

where 4 is the result (mod 2^64), and 1 is the overflow amount. You can think of these two stack values as representing the 128 bit number 2^64+4 broken into two 64 bit chunks.

Push values on the stack in this fashion makes it esimple use these 64 opcodes to do math on numbers larger than 64 bits by chunking them into 64-bit stack elements. The overflow amount tells you how much to carry into the next chunk.

You’d still get the benefit of having a flag to check, if overflow amount is not 0, overflow occurred.

I think the BIP as written lets you add numbers larger than 64-bits using a similar chunking approach, but it is less straight forward and requires IF statements and substructions operations.

dgpv · February 2, 2024, 5:25am

The overflow amount would be LE64, but then the zero in case of ‘no overflow’ would also need to be LE64 (otherwise more-than-64bit calculations would still require branching), and that would complicate the common case of checking for ‘no overflow’, because you cannot just do NOT VERIFY - NOT expects a scriptnum

dgpv · February 2, 2024, 4:49pm

If more-than-64bit arithmetic is desired, then I would say rustyrussell’s idea of using only-positive variable-length integers where larger lengths are allowed is more elegant, and coupled with FIXNUM (or maybe TOFIXNUM/FROMFIXNUM) opcode to convert between variable and fixed encodings, and BYTEREV to convert between low/big endian, that would cover a large range of practical needs

EthanHeilman · February 2, 2024, 6:44pm

If more-than-64bit arithmetic is desired, then I would say rustyrussell’s idea of using only-positive variable-length integers where larger lengths are allowed is more elegant,

I agree variable length would be more elegant. My worry was about fee pricing since, depending on the math operations allowed, really big nums require more computation than small numbers. You can put big numbers on the stack with single byte opcodes like HASH256. 64-bit chucks provides a very nice way to capture that increase in cost since need more opcodes to perform operations on bigger numbers.

The more I think about my fee cost argument here, the more convinced I am that it doesn’t matter and I am wrong. It seems perfectly fine to have operations on very large numbers in Bitcoin because if one wanted to maximize computational resources spent and minimize fee code that are much better opcodes than arithmetic.

The computational cost of performing arithmetic on even very large numbers, say 520 byte numbers, is probably much smaller than many single opcode instructions like HASH160 or CHECKSIG. I don’t have actual performance numbers on this though.

ajtowns · February 3, 2024, 12:02pm

I think there’s a few choices here:

what happens with “overflows” ?
- they’re not possible, everything happens modulo 2^n
- if the inputs are “too large”, the script aborts
- you get an overflow indicator in every result
what is 2^n or what constitutes “too large” ?
- BTC’s max supply is about 2^{51} satoshis, so 64bit is a fine minimum
- 2^{256} would let you manipulate scalars for secp256k1 which might be useful
- 2^{4160} would let you do numbers of to 520 bytes which matches current stack entry limits
unsigned only, or signed
- if we’re doing modular arithmetic, unsigned seems easier
- signed maths probably makes some contracts a bunch easier though
what serialization format?
- if signed, use a sign-bit or 2’s complement?
- fixed length or variable length – if fixed length, that constrains the precision we could use
- if variable length, does every integer have a unique serialization, or can you have “0000” and “0” and “-0” all as different representations of 0?

I guess 64-bit unsigned modular arithmetic would be easiest (uint64_t already works), but 4160-bit signed numbers (or some similarly large value that benchmarks fast enough) with abort-on-overflow behaviour and stored/serialized like CScriptNum format might be more appropriate?

Abort-on-overflow seems kind of appealing as far as writing contracts goes: we’ve already had an “overflow allows printing money” bug in bitcoin proper, so immediately aborting if that happens in script seems like a wise protection. If people want to do modular arithmetic, they could perhaps manually calculate ((x % k) * (y % k)) % k provided they pick k \le \sqrt{2^{n}} or similar.

dgpv · February 3, 2024, 4:04pm

I thought that this is not an option, since it introduces malleability that MINIMALDATA was added to eliminate