TCP hole punching for Bitcoin nodes behind home NATs?

While @willcl-ark and I were discussing Did the number of reachable nodes in residential ISPs increase since Bitcoin Core v30.0? - Observations - Bitcoin Network Operations Collective, which indicates that making -natpmp=1 in Bitcoin Core might not have had the hoped effect yet of making more nodes behind a home router NAT reachable, it occurred to us that we might be able to use TCP hole punching to connect two otherwise unreachable home nodes together.

This should not replace inbound-outbound connections, and isn’t a replacement of natpmp either, but offers a potential best-effort extension for extra connectivity of home nodes. We discussed this at a recent in-person Bitcoin Core developer meeting and want to keep the conversation going here. We are far from having a proposal for standardization nor an implemenation. At the moment, there are likely more open questions than answers.

TCP hole punching

TCP hole punching lets two computers behind certain NATs connect directly, without relaying traffic through a server. My understanding of one way we could use this:

  • Alice & Bob: unreachable home nodes behind a NAT
  • Charlie: reachable node
  1. Both nodes Alice and Bob connect via TCP to a reachable coordinator node Charlie. Charlie could advertise it’s coordinator/matchmaking capabilities in a service flag.
  2. Charlie learns the public endpoints of Alice and Bob. When Alice’s packets pass through her NAT, the NAT rewrites them with a public IP and port. Charlie sees this “outside” address and does the same for Bob.
  3. Charlie swaps endpoints. It tells Alice Bob’s public IP:port, and tells Bob Alice’s.
  4. Both sides simultaneously initiate a TCP connection to each other (See “Figure 7: Simultaneous Connection Synchronization” in RFC 9293) using the same local port they used to talk to Charlie (this requires SO_REUSEADDR/SO_REUSEPORT on the socket). Each side sends a SYN to the other’s public endpoint.
  5. The outbound SYNs punch holes. When Alice’s SYN leaves her NAT heading for Bob, her NAT creates a mapping: “traffic coming back from Bob’s address is allowed in.” Bob’s NAT does the same in reverse.
  6. The SYNs cross in flight. Now each NAT has a hole that permits the other side’s SYN through. The two stacks see this as a “simultaneous open” and complete the handshake directly - no server in the middle.

The tricky part is that TCP is stateful and timing-sensitive, so it’s flakier than UDP hole punching.

Additionally, it only works for certain types of NAT: RFC 4787 defines three NAT address and port mapping behaviors that are relevant here:

  • Endpoint-Independent Mapping (EIM): picks one external (IP, port) for a given internal socket and reuses it regardless of where the packet is going. Once Alice has sent any outbound packet, her external endpoint is fixed and predictable. Hole punching works. Possibly common on consumer residential routers (cable, fibre, DSL).
  • Address-Dependent Mapping (ADM): the external port stays the same while Alice is talking to the same destination IP, but a new external port is assigned the first time she contacts a different destination IP. Hole punching is hard. One option is to try to predict ports, but this likely does not work reliably.
  • Address+Port-Dependent Mapping (APDM; or symmetric NAT): Every distinct destination (IP, port) gets a fresh external port. We cannot predict what external port Alice’s NAT will assign for any specific peer. Hole punching does not work. Typical of CGNAT deployments, mobile carriers, and restrictive enterprise networks.

Usage in the Bitcoin protocol

Two approaches were discussed initially, but I feel like the design space is not fully explored yet.

  1. Rendevouz node: The home-nodes Alice and Bob both want to offer connection slots, but aren’t reachable from the internet. Both happen to pick node Charlie offering a rendezvous/matchmaking service flag and connect to it. Alice might connect first, and needs to wait until Bob connects too. Once both are connected, Charlie then tells them the respective other IP (step 3.). and that they should try to holepunch now. The connection to Charlie now stops. Alice and Bob try for a while to connect to each other.

  2. Inbound handoff instead of eviction: An alternive approach is that both Alice and Bob connect to Charlie, but since Charlies inbound slots are full, it tells them it’s going to hand them off to each other and the connection stops. Currently, when inbound slots are full and peers are evicted, we close the connection. Here, we offer them a way to retain a connection, but with a different node.

In both of these approaches, Charlie learns that Alice and Bob might now be connected, but not for how long. Alice learns that Bob and Charlie were connected. Bob learns that Alice and Charlie were connected. Both don’t know how long they were connected for.

There also was brief discussion on how to make this more private by not requiring a coordnator Carol. Can Alice and Bob, once they’ve figured out their public IP:port behind a EIM NAT, use a e.g. Tor connection to communicate their IP:port to establish a clearnet connection. A protocol without a coordinator makes it a lot easier to reason about the connection either being inbound or outbound, and is likely a lot easier to implement.

Implementation in Bitcoin Core?

Bitcoin Core currently uses the notion of “inbound” and “outbound” connections. What would hole-punched connections be? Outbounds are generally more trusted since we choose them, relay transactions faster, etc. Additionaly, there’s the concept of “outbund-full-relay” and “outbound-block-only” connections.

The connections intially were two “outbound”'s to Charlie. But since Alice didn’t choose the connection to Bob from her addrman (Charlie did while matchmaking), this is not a “we-picked-this-address-from-our-addrman” outbound. So would they be inbound-inbound? We probably have to rethink (and refactor) how we think about connections a bit.

Additional, e.g. BIP-324 has the concept of a Initiator and Responder for encrypted transport. Charlie can assign this to one of the sides. There might be other places where this is useful.

Open Questions

  • How well does TCP hole punching work in practice? Are there any stats on this by people using it? Does it work with most home routers? How common are EIM NATs and A(P)DM NATs?
  • Is there theoretical/academic research or P2P projects using this in the wild that we can learn from? Can we come into contact with some them?
  • When using a coordinator, what happens to Alice if either Bob or Charlie are malicious? What attacks can be done? This likely opens opportunities for sybils or ecplise attacks. How dangerous are these?
  • Does having “holepunch connections” really address problems we have today? Does it increase the connection slots the network and allows for more peer diversity for unreachable nodes? Does it offer better node inter-connectivity and increases resistance against partition and eclipse attacks? Does it allow for shorter relay paths between unreachable nodes (and is this something we want to optimize for)?
  • How much new and changed code is needs to implement this? It might change a lot of our current reasoning and understanding of P2P connections. How do we refactor the existing code for this?

Some resources

10 Likes

I’ve set up a server on two different IPs that returns the NAT IP:port to a connecting client. One can use nat-check.py to connect to these and get a classification of their NAT. The code for this, mostly written by an LLM, can be found in GitHub - 0xB10C/tcp-nat-check: A tool to check the NAT type of your router, which tells you if TCP hole punching would work on your router. · GitHub. Making requests to my hosts leaks your IP address to me, so you might chose to run your own servers. IP addresses are masked in output by default.

For me, at home and also using a phone hotspot indicates APDM for IPv4 and “no NAT” for IPv6:

$ python3 nat-check.py http://b10c.me:7770 http://b10c.me:7771 http://bnoc.xyz:7770 http://bnoc.xyz:7771

nat-check  TCP NAT mapping classifier

── IPv4 ──

local source port: 53321

  destination                            external addr             
  ────────────────────────────────────────────────────────────────
  b10c.me:7770 (x.x.x.1)                 x.x.x.2:61314             
  b10c.me:7771 (x.x.x.1)                 x.x.x.2:59431             
  bnoc.xyz:7770 (x.x.x.3)                x.x.x.2:63172             
  bnoc.xyz:7771 (x.x.x.3)                x.x.x.2:64569             

classification: ADDRESS+PORT-DEPENDENT MAPPING (APDM, 'symmetric')

  External port varies per destination (IP, port). TCP hole punching is very
  unlikely to work for you: the coordinator cannot predict the external port
  your NAT will assign for any given peer. This pattern is typical of CGNAT,
  mobile carriers, and restrictive enterprise networks.

── IPv6 ──

local source port: 53322

  destination                            external addr             
  ────────────────────────────────────────────────────────────────
  b10c.me:7770 (x::x:4)                  x::x:5:53322              
  b10c.me:7771 (x::x:4)                  x::x:5:53322              
  bnoc.xyz:7770 (x::x:6)                 x::x:5:53322              
  bnoc.xyz:7771 (x::x:6)                 x::x:5:53322              

classification: NO NAT

  External address (x::x:5:53322) equals the local address. There is no NAT;
  hole punching is unnecessary. However, your home router may still have a
  stateful firewall that blocks unsolicited inbound connections. You may need to
  open the port on your router to be reachable.

Via Obscura VPN it’s EIM for both IPv4 and IPv6:

python3 nat-check.py http://b10c.me:7770 http://b10c.me:7771 http://bnoc.xyz:7770 http://bnoc.xyz:7771
── IPv4 ──

local source port: 53486

  destination                            external addr             
  ────────────────────────────────────────────────────────────────
  b10c.me:7770 (x.x.x.1)                 x.x.x.2:53486             
  b10c.me:7771 (x.x.x.1)                 x.x.x.2:53486             
  bnoc.xyz:7770 (x.x.x.3)                x.x.x.2:53486             
  bnoc.xyz:7771 (x.x.x.3)                x.x.x.2:53486             

classification: ENDPOINT-INDEPENDENT MAPPING (EIM)

  All four destinations saw the same external port. Your NAT maps (internal IP,
  internal port) to a single external port regardless of destination. TCP hole
  punching has a strong chance of working: a coordinator can reliably tell a
  peer which external port to send to.

── IPv6 ──

local source port: 53487

  destination                            external addr             
  ────────────────────────────────────────────────────────────────
  b10c.me:7770 (x::x:4)                  x::x:5:53487              
  b10c.me:7771 (x::x:4)                  x::x:5:53487              
  bnoc.xyz:7770 (x::x:6)                 x::x:5:53487              
  bnoc.xyz:7771 (x::x:6)                 x::x:5:53487              

classification: ENDPOINT-INDEPENDENT MAPPING (EIM)

  All four destinations saw the same external port. Your NAT maps (internal IP,
  internal port) to a single external port regardless of destination. TCP hole
  punching has a strong chance of working: a coordinator can reliably tell a
  peer which external port to send to.

So IPv4 TCP hole punching would not work at home nor via phone hotspot due to being APDM and IPv6 likely requiring opening the firewall, but would work when using Obscura VPN on both IPv4 and IPv6 as it’s EIM.

I would be interested in seeing results from others.

Who and what IPv4 NAT IPv6 NAT
b10c at home & mobile hotspot APDM no NAT
Obscura VPN EIM EIM
@sipa at home EIM no NAT
@sipa using conference wifi EIM no IPv6
@sipa using hotel wifi EIM no IPv6
@sipa using airport wifi APDM no IPv6
@sipa using plane :airplane: wifi (Viasat) EIM no IPv6
@willcl-ark via starlink (business local priority) EIM no NAT
@dunxen at home EIM no NAT
@cedarctic at university campus APDM -
anon using ProtonVPN (default NAT) APDM -
@m3dwards using office internet on Mac EIM no IPv6
@m3dwards using office internet on Linux EIM no IPv6
@m3dwards using Docker Desktop on Mac APDM no IPv6
@m3dwards using Docker on Linux EIM no IPv6
@m3dwards using T-Mobile US hotspot EIM no IPv6
@m3dwards using Home router (OPNSense) APDM no IPv6
@Crypt-iQ using home internet EIM no NAT

As mentioned in this comment by sipa, it seems we could hole punch (through the firewall; not through NAT) for no NAT IPv6 too.

I also vibe-coded a “fun” holepunch program in python to test out the process a little bit, in case it’s of interest to anyone else:

It has an optional stun command which will hit a stuntman server I am running to get your external IP address and port as viewed by a remote entity. (In bitcoin core we can get this info from our peers, if we don’t already know it).

Swapping this info with a friend (or second machine) and using the peer command at ~ the same time on both instances will attempt a simultaneous connection.

stdin is sent to the other side, so you can send characters, files etc.

We had pretty decent results in testing, although we occasionally had to wait a few seconds for router firewalls to forget about their mappings (when we changed ip addresses but not ports), and were thwarted by one hotel NAT configuration.

As noted in the readme we also tried from within various types of containers, over starlink, through a VPN, and all of those chained together, and still saw success.

1 Like

I also vibecoded a demo application: GitHub - sipa/holeroulette: Vibecoded TCP hole punching experiment · GitHub

A server is running, you can test with ./client.py 144.217.240.89 or ./client.py 2607:5300:201:3100::3b74 to be connected to a random other client.

5 Likes

Thanks for this. It’s a topic I’ve also been interested in regarding node reachability although from having messed around with other hole-punching tech like Iroh. I’ll take some more time to read up, but just wanted to share the results running your script:

On my home internet connection: IPv4: EIM; IPv6: no NAT

1 Like

I’ve been thinking about this a bit and think something like this might work:

A Bitcoin node run at home might not be reachable via clearnet due to NAT, however, many home nodes now support Tor and/or I2P, which allows inbound connections via Tor. It seems possible to coordinate TCP hole punching through Tor or I2P.

A node might want to offer inbound slots via clearnet, but is not reachable. One way to work around this could be the following protocol with node A and node B both being nodes behind a EIM NAT, while also being connected to the Tor and/or I2P network.

  • Node A thinks it’s unreachable (TBD how to figure this out reliably) but wants to offer clearnet inbound connections.
  • It first needs to figure out if it’s behind a EIM NAT. It can do so by opening and establishing two (possibly feeler?) connections to other nodes using SO_REUSEADDR/SO_REUSEPORT on the socket and checking if the peers return the same port in the version message. This needs to be done only once per network, assuming the NAT configuration does not change.
  • Node A starts to listen on a dedicated hole-punch coordination Tor or I2P endpoint only for coordinating hole-punching. Address, transaction, or block relay is not supported on this endpoint. We don’t link any other Tor or I2P endpoints that this node might have to this dedicated endpoint.
  • This coordination Tor or I2P endpoint is advertised via addrv2 message on clearnet only (to not link clearnet and Tor/I2P address) with a (new) service flag: e.g. NODE_HOLEPUNCH(?). These addresses are relayed as a best effort, but not stored in addrman. This has the goal of them not being relayed after around 10 minutes and not using up space in addrman. The frequency we relay these is TBD. We don’t need to self-announce them, if we don’t want any more holepunch-inbound connections. This message could also be a new BIP-155 address type for hole punching which could include the Tor or I2P coordination endpoint and the clearnet address (coordinate on abcdef.onion to connect with 203.0.113.24) the nodes wants inbound connections on. This allows other nodes to filter by e.g. netgroup / AS without needing to make a connection to the coordination endpoint.
  • Node B receives such an announcement via an addrv2-message. It decides it wants to try to open an outbound connection to Node A. It has previously figured out that it’s behind a EIM NAT.
  • Node B connects to node A’s hole-punch coordination endpoint, and does a version handshake. As part of the connection to this hole-punch coordination endpoint, a request for a hole-punch connection is implicit.
  • Node A now opens a new outbound connection (e.g. a feeler?) to a known good address with SO_REUSEADDR/SO_REUSEPORT on the socket. The goal is to do a version handshake and learn the NAT IP:port of the socket. This likely requires some rate-limiting for some external party to cause Node A to make too many outbound connections.
  • Node B does the same.
  • Node A & B now have a EIM NAT mapped socket they can use to do the simultaneous TCP open and punch through their NATs.

A few notes on how to make this easier, but not without introducing tradeoffs:

  • With EIM NATs, we might be able to predict our NAT port (it’s often the same we bind on locally, at least from my limited observations). This might allow us to skip the outbound connections Node A and B need to make while already coordinating, making this a lot easier, faster, and causing less churn on the Bitcoin network. This will fail, when our NAT has already a mapping for the same port. The tradeoff here is possibly higher failure rates.
  • An alternative might be to have well-known, static, (and centralized!) Bitcoin-protocol-speaking-servers-but-not-nodes run on the P2P network that just do a version handshake returning the IP:port they see your NAT IP from and then close the connection. These might be run by community members, similar to the DNS seeds. They’d learn about who is trying to open a hole-punch connection, and a passive observer (ISP) would see you making connections to them.
  • Yet another alternative is that all nodes on the network would allow a special IPv4/IPv6 NAT check connection to them where it’s implicit that the connection will be shut down right after the version handshake. No need for them to be centralized, but would basically require implementing a STUN service for the P2P network in node software (e.g. Bitcoin Core).

Compared to the approach with a coordinator, this has the benefit of having clear inbound-outbound mechanics. Node B chooses Node A as an outbound. Node B is an inbound to A. The downside is that we need connectivity over Tor/I2P/(CJDNS). However, many node-in-a-box home nodes seem to ship with Tor / I2P on by default.

1 Like

A description of a reasonable design (that others came up with) for the Bitcoin P2P part of the rendezvous protocol that is over clearnet and requires a third coordinating node, but that might get a real outbound rather than something mutual-inbound:

  1. Alice wants to receive inbound connections to her Bitcoin node but is behind NAT.
  2. Alice learns through her ADDRMAN that Charlie runs a node that offers RENDEZVOUS services and opens a RENDEZVOUS-only connection to Charlie.
  3. Alice uses a new ADDRv2 network type to advertise that she can be reached through Charlie.
  4. Bob is looking for nodes to make an outbound connection to and decides to connect to Alice via RENDEZVOUS with Charlie.
  5. All three use the procedure described above to attempt a TCP-hole-punched connection from Bob to Alice.
  6. If Alice still has inbound slots available she reconnects to Charlie and awaits new connections.

I am having a hard time reasoning about whether or not there is any trust assumption in Charlie, or if Charlie has any power he wouldn’t have had otherwise. Since Charlie does get to decide whether or not Bob’s connection to some chosen Alice succeeds iff Charlie is the rendezvous, Charlie seems to have some sway over Bob’s outbounds, but how is this any different than if Charlie had advertised addresses that don’t work?

If there is some trust in Charlie, this can be avoided by treating all addresses Charlie or routable via Charlie as one address, so Bob will make a random choice of e.g. Charlie to connect to then a second choice randomly from the pool of Charlie+Routable_via_charlie. In this case Bob can treat the connection as a real outbound, this achieves the effect of increasing the number of inbound slots on the network, but I think that the number of independent parties n on the network that you have a 1-of-n assumption for when bootstrapping onto the network remains the same.

And I think that the Tor/I2P approach described above is an idealized case of this where each node can serve as it’s own RENDEZVOUS, but this would still have the problem that you have to be picking/trusting the tor/i2p rendezvous address, not the destination address, since an attacker could be spamming the network with fake/bad relays for real addresses.

2 Likes

Reading up on the discussion here - hole punching seems like an interesting idea to increase network connectivity and robustness to network adversaries.

A campus network that I used for testing had APDM and varying external IP addresses. IPv6 was disabled.

This recent paper claims a ~70% success rate with both TCP and QUIC (UDP-based transport) using libp2p’s DCUtR.

Thinking ahead if UDP hole punching proves to be much more effective, has there been any discussion in implementing QUIC or a custom UDP-based transport in core?

I wonder if this “hole punching” also works for VPN connections, making the node semi clear-net somewhat private. (Proton has some special NAT options)

I like this approach without a central coordinator via TOR / I2C. Don’t think it’s the easiest option to get working.

A custom UDP-based transport sounds like a very ambitious task. The P2P protocol is inherently stateful in many little ways; there is no real reason why it would need to be, but right now it is, and changing the application layer to drop statefulness would be a huge undertaking. So, directly running the existing application protocol over UDP instead of TCP wouldn’t work; duplicates, dropped messages, and out of order delivery would break things.

That means the options here are effectively either using QUIC, or a custom wrapper that reinvents it. I’m not familiar with how complicated the implementation of this is, but I suspect it wouldn’t really be worth the trade-off.

I’m somewhat hopeful that just TCP hole punching can give us a significant part of the benefits already though, knowing that it’s more fragile and less supported than UDP hole punching.

1 Like

On @0xB10C‘s approach of pairing clearnet addresses with an externally reachable Tor service endpoint by the same node for establishing clearnet connections: it seems like it would be fine for nodes that do not also run a P2P Tor service. It makes me wonder if it could increase chances of an adversary in correlating addresses when a node which also runs a P2P Tor service shuts down though. Maybe it’s already bad. Having nodes create new Tor services at the same interval as the new ephemeral addr messages would maybe help.


Regarding building a TCP-like protocol over UDP as touched upon by @cedarctic & @sipa I agree it is a major undertaking, but probably a smaller lift than for example multiprocess support being worked on in Core. It still needs champions with a lot of grit though. Maybe the adversarial aspects is what makes people pessimistic? I’ve only seen such protocols used in less adversarial environments, in closed source.

Having the ability to bypass strict ordering / stream semantics on a per packet basis would be at least somewhat beneficial for things like compact block propagation.

Agree on attempting TCP hole punching first though. :slightly_smiling_face:

Not sure how relevant, but one thing I’ll add is that NATs with EIM may still have “Address and Port-Dependent Filtering” (APF). Meaning that EIM:APF ↔ APM:APF won’t work.

Here’s a great piece on how the various combinations of mapping+filtering affect NAT traversability: How NATs Work, Part II: NAT types and STUN

3 Likes
Test V4 V6
Office internet on Mac EIM EIM
Docker desktop on Mac using bridged networking on office internet APDM Failed to connect
Linux Laptop on office internet EIM Failed to connect
Docker using bridged networking on Linux laptop on office internet EIM Failed to connect
T-Mobile US hotspot EIM EIM
Home router (OPNSense) APDM No IPV6 is configured

*Update, there is no IPV6 at the office or on T mobile hotspot so the V6 results are wrong. I think Mac is being helpful and connecting over IPV4 as a fallback.

1 Like

At home: IPv4 – EIM

IPv6 – No NAT

1 Like

:waving_hand: all, first time poster, I work on iroh, a UDP-based holepunching library built on QUIC: GitHub - n0-computer/iroh: IP addresses break, dial keys instead. Modular networking stack in Rust. · GitHub

Iroh has a 90%+ holepunching rate, and 99%+ connection rate, but uses federated relays similar to a webRTC STUN & TURN servers to achieve these numbers, which I’m assuming is not entirely appropriate for bitcoin. With that said, we’ve spent a lot of time on the topic, Including integrating with TOR, encrypting the exchange of candidate addresses, and employing novel techniques for holepunching itself.

ill keep an eye on this thread & can try to pull in experience & figures where useful!

11 Likes

@0xB10C Perhaps it’s overkill, but when applying the hole-punching technique to “No NAT” situations (incl. IPv6 where it seems somewhat common), connections look like they succeed as well. That’s expected of course, but means hole punching could be used for those even when a firewall or other configuration issue prevents publicly opening a port. So I think you could show those as bold in the results table as well.

1 Like

I wanted to see how a “sidecar” approach would feel and work based on @sipa’s holeroulette server, to require no changes to bitcoin core, and vibe-coded up a simple-enough demo: bitcoin/contrib/holepunch at tcp-sidecar · willcl-ark/bitcoin · GitHub

The sidecar script connects to the matchmaking server and will be matched with a peer by the server. one side is assigned initiator and one reciever. They then holepunch each other and initiator runs a bitcoin-cli addnode command (automatically or manually) to initiate the connection. The connection is proxied via the sidecar on both sides, so both nodes see a local (127.0.0.1) address.

This seems to work pretty nicely, but is reliant on a 3rd party server for “matchmaking”.

The default server is running with a signet client connected, in case anyone wants to test it out (using something like ./contrib/holepunch/sidecar.py --cli-command "bitcoin-cli -signet" --bitcoind-host "127.0.0.1:38333" --network signet).

3 Likes