64 bit arithmetic soft fork

I’d like to note on ‘adding extra encoding for numbers’. While extra encoding might add some burden in one place, it might lessen the burden in another place.

In complex covenants the computed values are often used to compare/combine with introspected values taken from the transaction, and the encoding of that values are usually LE64 or LE32.

In my experience designing covenant scripts on Elements, it definitely feels that there is value in being able to do computations using the same format as the numbers that are stored within the transaction, and with clearly defined ways to detect overflows etc.

On the second thought, maybe checking for overflows after each 64-bit arithmetic opcode is not that great, if we can have computations in (practically) arbitrary width integers and then only detect overflows on conversion to LE64/LE32.

(but really, adding VERIFY after each 64-bit arith op does not add any mental load - it just increases script size a bit)

(well, forgetting VERIFY after 64-bit arith op in Elements might lead to unexpected behavior in case of actual overflow…)

Dovetailing this, if a new developer comes to the project and wants to do some simple numerical computations in Script they currently have to learn about policy (SCRIPT_VERIFY_MINIMALDATA). How great would it be to be able to remove policies (reduce complexity, imo) with this change.

Perhaps this could also include removing the SCRIPT_VERIFY_MINIMALIF as well.

‘Removing’ is probably a strong word, but no longer having to take these policy flags into account for future soft forks.

I’ve added another post to talk about altering interpretation of op codes based on leaf version. My suggestion would be to alter the interpretation of OP_0,OP_1 etc to push 8 byte values onto the stack (instead of minimal encodings) if a specific leaf version is found.

This would maintain our invariant that numeric values are always 8 bytes when they are pushed onto the stack, but not necessarily 8 bytes (the exception is our literals, OP_0,OP_1..) when in the script.

MINIMALIF is already consensus-enforced for tapscript

Could this check be modified to take into account leaf versions or would that be a hard fork? I’m still learning about what consensus rules are possible to modify with tapscript leaf versions

I don’t understand this comment – CScriptNum is little endian in the first place, isn’t it? Both my understanding of the code and definition suggests it is, and the wiki seems to agree, at least.

The differences between CScriptNum and this proposal aiui are fixed vs variable length encoding, and using twos-complement vs a sign bit to handle negative numbers.

2 Likes
  1. I think it would be a (potentially big) waste of chainspace to move from minimally encoded numbers to fixed-length 64 bit numbers. Minimal encodings may be somewhat cumbersome to deal with, but they’re there for a reason!
  2. The only thing worse than dealing with one weird encoding is having to implement another one alongside it. CScriptNum parsing will probably always be a part of wallet software, if not just to validate legacy scripts.
3 Likes

I wonder how unfeasible is it to bring back the 256 bits arithmetic. Now that we have taproot, we can have some fancy crypto inside script, but that wouldn’t go onchain most of the time due to the happy-path aspect of taproot.

Imagine if we could make some convoluted sigma protocol onchain, but it’s only there in case of disputes. We can make those cost some extra sigops to discourage abusing that.

I think it would be a (potentially big) waste of chainspace to move from minimally encoded numbers to fixed-length 64 bit numbers

I don’t think this is the case. First off - literals (OP_0,OP_1,OP_2…) can just be re-interpreted based on sig version. That means the space they consume will remain at 1 byte, however when they are pushed onto the stack they will no longer be 1 byte - rather 8 bytes. This increases memory consumption, not disk space.

EDIT: Found a bug in my results of historical scanning of the blockchain, will come back and update this section after my bug is fixed and my results are accurate.

We can speculate what future blockchain usage patterns will look like, but lets be honest that is just speculation.

but they’re there for a reason!

Yes! They are. But not for the reason you mentioned.

Lets examine the ill fated BIP62. BIP62 was intended to solve transaction malleability, not optimizing for disk space.

Wrt to script numbers:

Zero-padded number pushes Any time a script opcode consumes a stack value that is interpreted as a number, it must be encoded in its shortest possible form. ‘Negative zero’ is not allowed. See reference: Numbers.

This is meant to solve malleability that was occurring on the network that would cause bitcoin businesses to lose track of withdrawals from their business (IIUC - MtGox suffered from this, again why i’m suspicious of my results).

If we examine the commit that introduced the fRequireMinimal flag in the interpreter, it specifically cites BIP62 rule #3 and #4 as for the reasoning for introducing it. It does not cite disk space usage.

As a by product of this proposed change, I actually believe we could potentially simplify interpreter.cpp for future soft forks by removing the need for SCRIPT_VERIFY_MINIMALDATA - how great is that?! Admittedly, this proposal does not cover encoding for push operations. I’m going to research expanding this proposal to cover these to see what effects it would have.

I’m also suspicious that the only reason we have CScriptNum is because it wrapped the openSSL number implementation. I’ve tried to track this down on github history to verify, but i’m unable to. Perhaps an OG can comment on this to see if this is correct. Here is as far as I got in github archaeology

The only thing worse than dealing with one weird encoding is having to implement another one alongside it.

Yes, so why are you advocating for keeping the 2nd (arguably 3rd, if you count CompactSize ints) format? I’m advocating for consolidating to a single format. As it exists today, we have two number encodings in the bitcoin protocol. The traditional int64_t one (which satoshi values are encoded in, and is of interest for a lot of BIP proposals) and our exotic one.

It is my understanding we were given the 2nd format by satoshi (I haven’t got to the bottom of this myself to confirm this) and have had to modify it multiple times to remove malleability.

CScriptNum parsing will probably always be a part of wallet software, if not just to validate legacy scripts.

I think ‘validate’ is a bit of a strong word as I’m not aware of a lot of external implementations of interpreter.cpp (although there are a few! bitcoin-s has one), but they definitely need to be able to build Scripts for their wallet software to receive/withdraw funds.

I don’t think this proposal would affect wallets that do standard signing logic (p2pkh, p2wpkh, p2trkp). I believe if your Script does not use OP_PUSHDATAx this holds true for p2sh,p2wsh, p2trsp, although I would like others to think about this make sure my mental model is correct. OP_PUSHDATAx is a relatively infrequent opcode, so I suspect that there isn’t wide usage of these (although I haven’t looked too much into the NFT world, where it might make sense to use OP_PUSHDATAx)

My view is that we should move away from exotic things like this in the bitcoin protocol in favor of standard encodings in the rest of the software ecosystem. While there will be an adjustment period, in 10 years people would look back on these changes say ‘remember when you had a numbering system in bitcoin’s programming language that was different than other protocol number representations? What a nightmare! Glad someone decided to consolidate that.’

Lets relegate this numbering system to a fun historical fact best talked about over a :beer: to establish your OG cred :slightly_smiling_face:.

The interpreter needs to support all the historical scripts, so I doubt very much it will be possible removing such code parts. At most this flag will be consensus-enforced.

Yes, which is why I said future soft forks. A better way to state this would be:

Future Script programmers would no longer need to consider SCRIPT_VERIFY_MINIMAL_DATA when writing their Scripts. “Legacy” Script programmers will always need to consider them for policy reasons - unless for some reason policy changes (and I don’t see why it would).

Perhaps that was worded a bit clunky, apologies.

I’m still exploring what the scope of this proposal could include while trying to adhere to my rule about soft forks :-). @jamesob got me thinking about the 2 paths forward via a DM.

I’m trying to decide where to draw the line between requiring all existing op codes to take 8 byte inputs (in a backwards compatible way via a SigVersion), or just adding these arithmetic op codes and allowing people to use the conversion op codes to ‘cast’ stack tops to the appropriate input size (OP_LE64TOSCRIPTNUM, OP_SCRIPTNUMTOLE64, OP_LE32TOLE64) for op codes that pre-date this soft fork proposal.

This proposal currently does the latter, would like to hear others input on this to see if the juice is worth the squeeze with requiring all inputs to be 8 bytes to existing op codes (i.e. OP_CLTV, OP_CSV, OP_WITHIN, OP_DEPTH…)

This comment also is a bit confusing as of course legacy Scripts will not need to be rewritten (p2sh, p2wsh, p2trsp).

If you want to upgrade to use this new proposed soft fork that require 8 byte inputs for operations such as OP_CLTV, this would require Script programmers to upgrade their Scripts to use the soft fork.

If we don’t require 8 byte inputs for anything besides the new numeric op codes (OP_ADD64, OP_SUB64, …) the upgrade story is pretty easy, but we retain the 2nd encoding.

I liked the ideas in that article. Dealing only with non-negative numbers might simplify some things.

Also removing “normalization” from LSHIFT and RSHIFT is the right thing - Elements implementations remove leading zeroes, and coupled with sign bit that is not actually considered after the shift, it makes modelling these opcodes with Z3 needlessly complex, and also normalization can introduce malleability: 5 LSHIFT can take 0x01, 0x0100, 0x010000 with the same result.

Relying on __uint128_t support in GCC and clang is a bit iffy… Would that mean that other compilers will be somehow discouraged ?

I think there’s still need for a way to easily convert to fixed-with numbers. Maybe a generic opcode that would take number of bytes, and zero-pads or truncates if necessary ? Pobably truncation should succeed only if removed bytes are all zero, and script should fail otherwise

For example, 4 FIXNUM would make LE32 from the argument, 8 FIXNUM will make LE64, 32 FIXNUM will make a 256-bit number

I think it would be a (potentially big) waste of chainspace…

Ok finally have some results from this.

The proposal to shift from minimal encodings to 8 byte encodings for ScriptNums would result in roughly 1,176,417,034 (~1GB) larger blockchain. My current local blockchain size is

du -sh ~/.bitcoin/blocks
576G    /home/chris/.bitcoin/blocks

This means this proposal would increase the blockchain size by is 0.17% since the genesis block.

I can share my json file used to calculate these, its large (13GB). Here is an example of a few txs, along with a link to my source code for calculating these json values

{"txIdBE":"3ef4fe026fbea7160f6bc55288db14df2889f088f5a5654d21acaff3e1bb7197","scriptConstants":["04ec091a","18","05","1d"],"sizeIncrease":25,"comment":"ASM"},
{"txIdBE":"91d7d2c1cae734481e621f33c226f23bad257ea28f47cfb45f076f44c732887e","scriptConstants":["04ec091a","17","05"],"sizeIncrease":18,"comment":"ASM"},
{"txIdBE":"cfc7b512ec29507ea5f576ff6185f580047f72e3edfef30f783a6d3fec236d03","scriptConstants":["456c6967697573","74657374","b630"],"sizeIncrease":11,"comment":"ASM"},
{"txIdBE":"fc61cbe788267b631448d001c98bdb45ac70bf8bfd0a1df2b634e4255fcd6664","scriptConstants":["456c6967697573","74657374","d6fd00"],"sizeIncrease":10,"comment":"ASM"},
{"txIdBE":"e3853d6faa369b20c31369526af6c6c2d25686260805185d2b5afa8dbd98602a","scriptConstants":["456c6967697573","fe","bea800"],"sizeIncrease":13,"comment":"ASM"},

I’m working on testnet3 results, i’ll edit those in when I’m done with them.

I think this is only true for the 0-16 numbers. The moment you want to push > 16 onto the stack, you have to add the full 8-byte representation to your script.

This leads me towards thinking we should keep variable length encoding the default, as we could move to the new format without incurring extra cost.

That being said, having worked with RISC-V emulation in Bitcoin Script lately (see Eltrace), I see a real need for the ability to get the 32/64 bit LE representation of a number during script execution. That we could easily add independently of the underlying number format (OP_SCRIPTNUM[TO/FROM]LE64), and perhaps give us all what we need without introducing full arithmetic support for another format.

I think this is an important point. The reason we want 64-bit arithmetics in the first place is to enforce values on the next transaction, and the interesting values are indeed (AFAIK) encoded using fixed-length LE (notable exception is number of inputs/outputs).

How to handle this comes down to the introspection opcodes themselves, as they can be made to put ScriptNum on the stack if that’s what we want.

Thanks for this writeup, Rusty! I agree moving to unsigned-only values could simplify a lot of things, especially if the format is backwards compatible with the existing ScriptNum representation for positive numbers.

By bumping the leaf version you could also re-use all existing opcodes (no OP_ADDV etc)?

@Chris_Stewart_5

Continuing my question from BIP.

The BIP states: “If the operation results in an overflow, push false onto the stack”.

I would propose altering this to so that the result and the overflow amount are pushed onto the stack.

Case 1, no overflow:

1, 1, OP_ADD64 → 2, 0

where 2 is the result and 0 is the overflow amount.

Case 2, overflow:

2^64 - 1, 5, OP_ADD64 → 4, 1

where 4 is the result (mod 2^64), and 1 is the overflow amount. You can think of these two stack values as representing the 128 bit number 2^64+4 broken into two 64 bit chunks.

Push values on the stack in this fashion makes it esimple use these 64 opcodes to do math on numbers larger than 64 bits by chunking them into 64-bit stack elements. The overflow amount tells you how much to carry into the next chunk.

You’d still get the benefit of having a flag to check, if overflow amount is not 0, overflow occurred.

I think the BIP as written lets you add numbers larger than 64-bits using a similar chunking approach, but it is less straight forward and requires IF statements and substructions operations.

The overflow amount would be LE64, but then the zero in case of ‘no overflow’ would also need to be LE64 (otherwise more-than-64bit calculations would still require branching), and that would complicate the common case of checking for ‘no overflow’, because you cannot just do NOT VERIFY - NOT expects a scriptnum

If more-than-64bit arithmetic is desired, then I would say rustyrussell’s idea of using only-positive variable-length integers where larger lengths are allowed is more elegant, and coupled with FIXNUM (or maybe TOFIXNUM/FROMFIXNUM) opcode to convert between variable and fixed encodings, and BYTEREV to convert between low/big endian, that would cover a large range of practical needs

If more-than-64bit arithmetic is desired, then I would say rustyrussell’s idea of using only-positive variable-length integers where larger lengths are allowed is more elegant,

I agree variable length would be more elegant. My worry was about fee pricing since, depending on the math operations allowed, really big nums require more computation than small numbers. You can put big numbers on the stack with single byte opcodes like HASH256. 64-bit chucks provides a very nice way to capture that increase in cost since need more opcodes to perform operations on bigger numbers.

The more I think about my fee cost argument here, the more convinced I am that it doesn’t matter and I am wrong. It seems perfectly fine to have operations on very large numbers in Bitcoin because if one wanted to maximize computational resources spent and minimize fee code that are much better opcodes than arithmetic.

The computational cost of performing arithmetic on even very large numbers, say 520 byte numbers, is probably much smaller than many single opcode instructions like HASH160 or CHECKSIG. I don’t have actual performance numbers on this though.