Gossip Observer: New project to monitor the Lightning P2P network

I agree that we’ll need some logic to handle outpoints that we can’t encode. Given that the top block of a 24-bit blocknum will flip ~142 years from now, IMO it’s fine to XOR that with the bottom bit of txindex.

I got to talk to someone from the LDK team last week, and they presented some convincing arguments of why a query protocol doesn’t really add value. But the consequences of the ordering you’re describing seem a bit ‘flipped’ to me, specifically point 3.2.

Since sketch decode is most of the work, in order for Alice to increase her odds of a successful sketch decode for the next received sketch, she will send her peer Bob a larger sketch, and he’ll perform (much) more work, and possibly end up sending Alice more messages she’s missing? The extra work is done by Bob, for Alice’s benefit.

A simplified version of my diagram above is that Alice never explicitly requests initial sketches (so your step 1). But she MAY request an extension, and attempt decode again. The decode work is still done by Alice, not Bob. However, I think that would require new P2P messages so Alice could query Bob for missing set elements.

Another point that came out of discussions last week was that, instead of tracking the ‘freshness’ of sketches per peer (I make a new sketch for each peer right before sending a sketch to them), a node can have one ‘internal’ timer of when they will recompute their sketch, and separate peer-per timers of when they will send whichever sketch they have. So there would be no tracking of which elements were included in the last sketch sent to a particular peer, and the cost of serving a sketch extension is just bandwidth vs. more compute.

That would definitely work. FWIW, I think the odds of getting that feature added to libminisketch are high, if the use case can be clearly demonstrated.

I’m not sure how often the ‘stream all gossip’ fallback is used now, so…maybe? If a node is offline for long enough, I definitely agree that reconciliation won’t work for catching up on gossip. But that feels separate from handling a decode failure during normal operation. It’s interesting that decode failure/success gives some signal on how much information a node is missing, compared to the situation now / with flooding.