64 bit arithmetic soft fork

If your goal is to allow varying size big nums so that you can do maths in the secp scalar field, you don’t want to enforce minimaldata – otherwise if you take the hash of a blockheader to generate a scalar, and naturally end up with leading zeroes, you have to strip those zeroes before you can do any maths on that hash. Stripping an arbitrary number of zeroes is then also awkward with minimaldata rules, though not impossible.

(I think a minimaldata variant that only applies to values pulled directly from the witness stack, but not from program pushes or the results of operations) would be interesting)

if there is FROMFIXNUM opcode that takes the size of fixed-size integer as one argument and byte blob of that size as another argument and turns it into a variable-size integer it will be easy (and also BYTEREV to deal with endianness)

1 Like

If anyone is interested to see how this set of proposals could be used by other op codes please see my new post about OP_INOUT_AMOUNT

Here is a direct link to a few test scripts in the python test framework

Nice!

How would the IN_OUT opcode change if one decided to go with 64-bit ScriptNum type encoding instead of the LE variant?

I guess not a large change since you would just pushed a ScriptNum encoded integer on the stack instead?

  CScriptNum bn(fundingAmount);
  stack.push_back(bn.getvch());

Another metric to look at is who’s paying for those extra bytes. So for a set of typical transactions (in existence now and expected in the future), how much do those increase?

I’m going to code this up to confirm ergonomics - so mistakes are likely in this post. Call them out if you see them. Here is my understanding without actually writing the code yet

If we were to continue with CScriptNum, as my OP_INOUT_AMOUNT implementation works currently

  1. Read int64_t representing satoshis from BaseTransactionSignatureChecker.GetTransactionData()
  2. Convert the int64_t into a minimally encoded CScriptNum. I don’t think this necessarily has to be done by an op code, could be done in the impl of OP_INOUT_AMOUNT itself
  3. Call CScriptNum constructor, modifying the nMaxNumSize parameter to support 8 bytes.
  4. Push CScriptNum onto stack
  5. Wherever the satoshi value on the stack top is consumed by another op code, we need to figure out how to allow for nMaxNumSize to be 8 bytes.

As an example for step 5, lets assume we are using THE OLD (pre-64bit) numeric op codes

You see we interpret the stack top as CScriptNum, however that CScriptNum has a nMaxNumSize=4 rather than 8 bytes. This leads to an overflow exception being thrown by CScriptNum. This same problem applies to any opcode (another example is OP_WITHIN) that uses CScriptNum to interpret the stack top.

Hi everyone, we have a bitcoin core pr review club tomorrow (3/20/2024) at 17:00 UTC for this implementation.

Happy to answer any questions about the PR, and talk through the ‘Design Questions’ section that I think still have open questions. Hope to see some of yall tomorrow at 17:00 UTC.

Hi everyone, a new update:

This version of 64bit arithmetic

  • Removes 64bit specific opcodes in favor repurposing existing opcodes (OP_ADD64OP_ADD, OP_SUB64OP_SUB, OP_MUL64OP_MUL, OP_DIV64OP_DIV etc).
  • Every opcode in interpreter.cpp that use to accept a CScriptNum input now accepts a int64_t stack parameter. For instance, OP_1ADD accepts a int64_t stack top argument, and pushes a int64_t back onto the stack along with a bool indicating if the OP_1ADD execution was successful.
  • Removes casting opcodes (OP_SCRIPTNUMTOLE64, OP_LE64TOSCRIPTNUM, OP_LE32TOLE64)

I think this PR provides for a better developer experience as Script developers

  1. No longer have to think about which opcode to use (OP_ADD or OP_ADD64, OP_LESSTHAN or OP_LESSTHAN64, etc)
  2. No longer have to worry about casting the stack top with previous casting op codes (OP_SCRIPTNUMTOLE64, OP_LE64TOSCRIPTNUM, OP_LE32TOLE64)

This 64bit implementation would mean existing Scripts that typically use constant numeric arguments – such as OP_CHECKLOCKTIMEVERIFY/OP_CHECKSEQUENCEVERIFY would need to be rewritten to pass in 8 byte parameters rather than 5 byte parameters.

This PR heavily relies on pattern matching on SigVersion to determine what the implementation of the opcode should do.

For instance, here is the implementation of OP_DEPTH

In the future, if we want to redefine semantics of OP_DEPTH we can now pattern match on the SigVersion and substitute the new implementation.

I think this provides us a nice framework for upgrading the interpreter in the future. We will have the compiler give errors in places where we aren’t handling a new SigVersion introduced in the codebase (exhaustiveness checks), and force us to handle that case.

FWIW, the big concern I have with this is people writing scripts where they don’t think an overflow is possible, so they just do an OP_DROP for the overflow indicator, and then someone thinks a bit harder, and figures out how to steal money via an overflow, and then they do exactly that. That’s arguably easily mitigated: just use OP_VERIFY to guarantee there wasn’t an overflow, but I noticed that an example script in review club used the more obvious DROP:

<Chris_Stewart_5> Script: 0x000e876481700000 0x000e876481700000 OP_ADD64 OP_DROP OP_LE64TOSCRIPTNUM OP_SIZE OP_8 OP_EQUALVERIFY OP_SCRIPTNUMTOLE64 0x001d0ed902e00000 OP_EQUAL

Worries me a bit when the obvious way of doing something (“this won’t ever overflow, so just drop it”) is risky.

You could imagine introducing two opcodes: “OP_ADD64” and “OP_ADD64VERIFY” the latter of which does an implicit VERIFY, and hence fails the script if there was overflow; but that would effectively be the existing behaviour of OP_ADD. So I guess what I’m saying is: maybe consider an approach along the lines that sipa suggested:

where you change ADD to work with 64bit numbers (in whatever format), and add a new ADD_OF, MUL_OF (here OF implies “flag on stack” instead of “link in bio”)

2 Likes

Unfortunately as language designers we can only give people tools to build safe programs, we can’t force them to always use those tools correctly. There will always be developers that figure out innovative ways to unsafely use your PL.

For those that want to be writing safe programs, we now have the tools available to them do so. Those don’t exist at the moment.

Here is the CScriptNum extension prototype (PR against master):

Highlights

  • Re-enables the OP_MUL and OP_DIV opcodes
  • Support 8 byte computation w/ arithmetic and comparison op codes.
  • Doesn’t add any new opcodes, rather repurposes existing opcodes based on SigVersion.
  • Changes the underlying impl type in CScriptNum from int64_t__int128_t
  • Preserves behavior of the old CScriptNum (variable length encoding, allows overflow results, but no computation on overflowed results).

For those that may not be familiar with the existing behavior of CScriptNum, I would suggest reading this comment in script.h

/**
 * Numeric opcodes (OP_1ADD, etc) are restricted to operating on 4-byte integers.
 * The semantics are subtle, though: operands must be in the range [-2^31 +1...2^31 -1],
 * but results may overflow (and are valid as long as they are not used in a subsequent
 * numeric operation). CScriptNum enforces those semantics by storing results as
 * an int64 and allowing out-of-range values to be returned as a vector of bytes but
 * throwing an exception if arithmetic is done or the result is interpreted as an integer.
 */

The intention of this branch is to retain this behavior, only extending the supported range of values to [-2^63 +1...2^63 -1].

Consensus risk

This PR changes the behavior of CScriptNum. For instance, the constructor for CScriptNum now takes a __int128_t as a parameter rather than int64_t. This constructor is called for all SigVersion (not just SigVersion::TAPSCRIPT_64BIT). This seems like it could lead to some consensus risk with old nodes if someone crafts a specific transaction using segwit v0 or tapscript that exceeds std::numeric_limits<int64_t>::max() but is less than std::numeric_limits<__int128_t>::max().

More problems with __int128_t

I chose to use __in128_t as it seemed like the logical thing to do to extend the existing behavior for support for overflow values.

I’m not an expert on c++, but it appears that __int128_t is not supported properly on windows. I’m seeing CI failures on windows that look like this

D:\a\bitcoin\bitcoin\src\script\script.h(229,25): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int [D:\a\bitcoin\bitcoin\build_msvc\libbitcoin_qt\libbitcoin_qt.vcxproj]
D:\a\bitcoin\bitcoin\src\script\script.h(229,41): error C2143: syntax error: missing ',' before '&' [D:\a\bitcoin\bitcoin\build_msvc\libbitcoin_qt\libbitcoin_qt.vcxproj]
D:\a\bitcoin\bitcoin\src\script\script.h(290,33): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int [D:

Perhaps there is a better way to implement this while avoiding using __int128_t. I’m open to ideas.

For comparison, crypto/muhash.h uses __int128_t conditionally.

Could maybe use uint64_t absval; bool negated; instead, with various manual checks to see if things end up out of bounds? Otherwise, just doing a bignum class might make sense (like arith_uint256).

1 Like