If your goal is to allow varying size big nums so that you can do maths in the secp scalar field, you don’t want to enforce minimaldata – otherwise if you take the hash of a blockheader to generate a scalar, and naturally end up with leading zeroes, you have to strip those zeroes before you can do any maths on that hash. Stripping an arbitrary number of zeroes is then also awkward with minimaldata rules, though not impossible.
(I think a minimaldata variant that only applies to values pulled directly from the witness stack, but not from program pushes or the results of operations) would be interesting)
if there is FROMFIXNUM opcode that takes the size of fixed-size integer as one argument and byte blob of that size as another argument and turns it into a variable-size integer it will be easy (and also BYTEREV to deal with endianness)
Another metric to look at is who’s paying for those extra bytes. So for a set of typical transactions (in existence now and expected in the future), how much do those increase?
I’m going to code this up to confirm ergonomics - so mistakes are likely in this post. Call them out if you see them. Here is my understanding without actually writing the code yet
Read int64_t representing satoshis from BaseTransactionSignatureChecker.GetTransactionData()
Convert the int64_t into a minimally encoded CScriptNum. I don’t think this necessarily has to be done by an op code, could be done in the impl of OP_INOUT_AMOUNT itself
Wherever the satoshi value on the stack top is consumed by another op code, we need to figure out how to allow for nMaxNumSize to be 8 bytes.
As an example for step 5, lets assume we are using THE OLD (pre-64bit) numeric op codes
You see we interpret the stack top as CScriptNum, however that CScriptNum has a nMaxNumSize=4 rather than 8 bytes. This leads to an overflow exception being thrown by CScriptNum. This same problem applies to any opcode (another example is OP_WITHIN) that uses CScriptNum to interpret the stack top.
Hi everyone, we have a bitcoin core pr review club tomorrow (3/20/2024) at 17:00 UTC for this implementation.
Happy to answer any questions about the PR, and talk through the ‘Design Questions’ section that I think still have open questions. Hope to see some of yall tomorrow at 17:00 UTC.
Every opcode in interpreter.cpp that use to accept a CScriptNum input now accepts a int64_t stack parameter. For instance, OP_1ADD accepts a int64_t stack top argument, and pushes a int64_t back onto the stack along with a bool indicating if the OP_1ADD execution was successful.
I think this PR provides for a better developer experience as Script developers
No longer have to think about which opcode to use (OP_ADD or OP_ADD64, OP_LESSTHAN or OP_LESSTHAN64, etc)
No longer have to worry about casting the stack top with previous casting op codes (OP_SCRIPTNUMTOLE64, OP_LE64TOSCRIPTNUM, OP_LE32TOLE64)
This 64bit implementation would mean existing Scripts that typically use constant numeric arguments – such as OP_CHECKLOCKTIMEVERIFY/OP_CHECKSEQUENCEVERIFY would need to be rewritten to pass in 8 byte parameters rather than 5 byte parameters.
This PR heavily relies on pattern matching on SigVersion to determine what the implementation of the opcode should do.
For instance, here is the implementation of OP_DEPTH
In the future, if we want to redefine semantics of OP_DEPTH we can now pattern match on the SigVersion and substitute the new implementation.
I think this provides us a nice framework for upgrading the interpreter in the future. We will have the compiler give errors in places where we aren’t handling a new SigVersion introduced in the codebase (exhaustiveness checks), and force us to handle that case.
FWIW, the big concern I have with this is people writing scripts where they don’t think an overflow is possible, so they just do an OP_DROP for the overflow indicator, and then someone thinks a bit harder, and figures out how to steal money via an overflow, and then they do exactly that. That’s arguably easily mitigated: just use OP_VERIFY to guarantee there wasn’t an overflow, but I noticed that an example script in review club used the more obvious DROP:
Worries me a bit when the obvious way of doing something (“this won’t ever overflow, so just drop it”) is risky.
You could imagine introducing two opcodes: “OP_ADD64” and “OP_ADD64VERIFY” the latter of which does an implicit VERIFY, and hence fails the script if there was overflow; but that would effectively be the existing behaviour of OP_ADD. So I guess what I’m saying is: maybe consider an approach along the lines that sipa suggested:
where you change ADD to work with 64bit numbers (in whatever format), and add a new ADD_OF, MUL_OF (here OF implies “flag on stack” instead of “link in bio”)
Unfortunately as language designers we can only give people tools to build safe programs, we can’t force them to always use those tools correctly. There will always be developers that figure out innovative ways to unsafely use your PL.
For those that want to be writing safe programs, we now have the tools available to them do so. Those don’t exist at the moment.
Support 8 byte computation w/ arithmetic and comparison op codes.
Doesn’t add any new opcodes, rather repurposes existing opcodes based on SigVersion.
Changes the underlying impl type in CScriptNum from int64_t → __int128_t
Preserves behavior of the old CScriptNum (variable length encoding, allows overflow results, but no computation on overflowed results).
For those that may not be familiar with the existing behavior of CScriptNum, I would suggest reading this comment in script.h
/**
* Numeric opcodes (OP_1ADD, etc) are restricted to operating on 4-byte integers.
* The semantics are subtle, though: operands must be in the range [-2^31 +1...2^31 -1],
* but results may overflow (and are valid as long as they are not used in a subsequent
* numeric operation). CScriptNum enforces those semantics by storing results as
* an int64 and allowing out-of-range values to be returned as a vector of bytes but
* throwing an exception if arithmetic is done or the result is interpreted as an integer.
*/
The intention of this branch is to retain this behavior, only extending the supported range of values to [-2^63 +1...2^63 -1].
Consensus risk
This PR changes the behavior of CScriptNum. For instance, the constructor for CScriptNum now takes a __int128_t as a parameter rather than int64_t. This constructor is called for allSigVersion (not just SigVersion::TAPSCRIPT_64BIT). This seems like it could lead to some consensus risk with old nodes if someone crafts a specific transaction using segwit v0 or tapscript that exceeds std::numeric_limits<int64_t>::max() but is less than std::numeric_limits<__int128_t>::max().
More problems with __int128_t
I chose to use __in128_t as it seemed like the logical thing to do to extend the existing behavior for support for overflow values.
I’m not an expert on c++, but it appears that __int128_t is not supported properly on windows. I’m seeing CI failures on windows that look like this
D:\a\bitcoin\bitcoin\src\script\script.h(229,25): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int [D:\a\bitcoin\bitcoin\build_msvc\libbitcoin_qt\libbitcoin_qt.vcxproj]
D:\a\bitcoin\bitcoin\src\script\script.h(229,41): error C2143: syntax error: missing ',' before '&' [D:\a\bitcoin\bitcoin\build_msvc\libbitcoin_qt\libbitcoin_qt.vcxproj]
D:\a\bitcoin\bitcoin\src\script\script.h(290,33): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int [D:
Perhaps there is a better way to implement this while avoiding using __int128_t. I’m open to ideas.
For comparison, crypto/muhash.h uses __int128_t conditionally.
Could maybe use uint64_t absval; bool negated; instead, with various manual checks to see if things end up out of bounds? Otherwise, just doing a bignum class might make sense (like arith_uint256).