Property-based testing for Bitcoin Core

I just want to share some initial experiments I’ve done regarding property-based testing for Core.

Property-based tests are designed to test the aspects of a property that should always be true . They allow for a range of inputs to be programmed and tested within a single test rather than writing a different test for every value you want to test.

First, let’s check a Bitcoin Core’s functional test that tests resource exhaustion. See:

def test_resource_exhaustion(self):
        self.log.info("Test node stays up despite many large junk messages")
        conn = self.nodes[0].add_p2p_connection(P2PDataStore())
        conn2 = self.nodes[0].add_p2p_connection(P2PDataStore())
        msg_at_size = msg_unrecognized(str_data="b" * VALID_DATA_LIMIT)
        assert len(msg_at_size.serialize()) == MAX_PROTOCOL_MESSAGE_LENGTH

        self.log.info("(a) Send 80 messages, each of maximum valid data size (4MB)")
        for _ in range(80):
            conn.send_message(msg_at_size)

        # Check that, even though the node is being hammered by nonsense from one
        # connection, it can still service other peers in a timely way.
        self.log.info("(b) Check node still services peers in a timely way")
        for _ in range(20):
            conn2.sync_with_ping(timeout=2)

        self.log.info("(c) Wait for node to drop junk messages, while remaining connected")
        conn.sync_with_ping(timeout=400)

        # Despite being served up a bunch of nonsense, the peers should still be connected.
        assert conn.is_connected
        assert conn2.is_connected
        self.nodes[0].disconnect_p2ps()

By looking at this function, some questions were raised on my mind:

  • What if the node had more than two connections?
  • What if connections happen at different times?
  • What if one of the connections sent more or less than 80 messages?
  • What if other connections/disconnections happen during the path?

Well, there are a lot of probabilities, but it seems a perfect scenario for a property-based test. Since Bitcoin Core has its test framework and it is written in Python, I used it to draft a property-based test using TSTL.

TSTL is a domain-specific language (DSL) and set of tools to support automated generation of tests for software. This implementation targets Python. You define (in Python) a set of components used to build up a test, and any properties you want to hold for the tested system, and TSTL generates tests for your system. TSTL supports test replay, test reduction, and code coverage analysis, and includes push-button support for some sophisticated test-generation methods. In other words, TSTL is a property-based testing tool.

I wrote a TSTL file to replicate the idea of the resource exhaustion test from p2p_invalid_messages.py. See:

...
init: TestShell().setup(num_nodes=1, setup_clean_chain=True)
finally: TestShell().shutdown()
pool: <connection> 100
pool: <node> 1
pool: <int> 1

<int> := <[80..200]>
<node> := TestShell().nodes[0]
<connection> := <node>.add_p2p_connection(P2PDataStore())

for _ in range(<int>): conn = <connection>; conn.send_message(msg_at_size); conn.sync_with_ping(timeout=400)
<connection>.sync_with_ping(timeout=2)

property: <connection>.is_connected

It has a pool of 100 possible connections, one node, and one integer value (from 80 to 200). TSTL will create different scenarios with different amounts of messages being sent, different orders of connections, synchronization, and other stuff. Regardless of all this, we must ensure that any disconnection will happen.

TSTL is able to create scenarios with more than 1000 actions from this simple script.

4 Likes

Thank you for being another voice evangelizing property testing. We need more of it across the entire software industry and especially on projects that need to be very well understood. I don’t work on Bitcoin Core specifically, but I wholeheartedly encourage you to keep pushing on this frontier :saluting_face:.

If you feel like reviving this old PR from of mine from back in the day it might be useful: [POC] Introducing property based testing to Core by Christewart · Pull Request #8469 · bitcoin/bitcoin · GitHub

However IIRC this ended up being viewed as redundant with our fuzzing infrastructure.

Thanks for it. I did not take look in depth yet but it doesn’t seem to be property based testing for black-box stuff (functional), right?. I agree that a white-box approach for it would be redundant with fuzzing.

I did not take look in depth yet but it doesn’t seem to be property based testing for black-box stuff (functional), right?

This is correct.

I guess I haven’t heard of property based testing being used via networking layers, usually I’ve heard it used (and what we do in bitcoin-s) is accessing data structures directly for a couple of reasons

  1. Do we want to property based test the entire networking stack? That seems very inefficient and will probably lead to very flaky tests
  2. Higher maintenance burden (although since we already have test suites in c++ and python maybe this doesn’t apply as much to bitcoin core).

As a general note, I find the python test framework lacking in completeness compared to c++ implementation. For instance, when working on my 64 bit arithmetic PR it seems we just assume correctness of values given to the Python test framework. I’m suspicious that this occurs more often that we would like.

Of course you can take this to mean we need to do this work to find these bugs, but I think this will result in a secondary consensus implementation in Python :-). Reasonable minds can differ of course, but that is my two sats of input.

That said, shameless plug i think the 64bit arithmetic work would be a great place to start with property based testing. Its “simple” in the sense that we are just doing arithmetic. You could even test the existing opcodes (OP_ADD, OP_SUB…) if you don’t want to speculate on this soft fork getting activated, but want to test something straightforward.

In my view, property testing is more of a strategy/philosophy than anything else. It can be conducted at all layers. The point is that the values are randomized (ideally intelligently to try and detect boundary conditions), and the invariants are expressed at the API level without punching through the vail of the APIs privacy boundary. In the case of Core I’d say the total API surface area is the peer network layer and the RPC layer. We may want to synthesize random messages, of these types and sequence them in random orders. Ideally we should have a long list of conditions we expect to hold without exception. The less concise those conditions are, the worse the design is overall. Due to the consensus nature of Bitcoin many of the uglier conditions we will have to live with til The End Of Time but on a go forward basis it’d be nice to have a set of concise properties we expect to hold.

The strategy is extremely powerful no matter where you apply it, provided you can land on these true invariants. I think the observation you make about applying it to inner layers is really an observation of the fact that the smaller the component/system, the better we understand it, which shouldn’t be surprising. However, regardless of the scope we apply it to, any tests we can create using a property construction give us far better assurances than a corresponding single-point unit test.

So I think applying this strategy to any layer that they are interested in applying it to ought to be enthusiastically supported :grin:

Yes, it sounds a good place for property-based testing