While @willcl-ark and I were discussing
Did the number of reachable nodes in residential ISPs increase since Bitcoin Core v30.0? - Observations - Bitcoin Network Operations Collective, which indicates that making -natpmp=1 in Bitcoin Core might not have had the hoped effect yet of making more nodes behind a home router NAT reachable, it occurred to us that we might be able to use TCP hole punching to connect two otherwise unreachable home nodes together.
This should not replace inbound-outbound connections, and isn’t a replacement of natpmp either, but offers a potential best-effort extension for extra connectivity of home nodes. We discussed this at a recent in-person Bitcoin Core developer meeting and want to keep the conversation going here. We are far from having a proposal for standardization nor an implemenation. At the moment, there are likely more open questions than answers.
TCP hole punching
TCP hole punching lets two computers behind certain NATs connect directly, without relaying traffic through a server. My understanding of one way we could use this:
- Alice & Bob: unreachable home nodes behind a NAT
- Charlie: reachable node
- Both nodes Alice and Bob connect via TCP to a reachable coordinator node Charlie. Charlie could advertise it’s coordinator/matchmaking capabilities in a service flag.
- Charlie learns the public endpoints of Alice and Bob. When Alice’s packets pass through her NAT, the NAT rewrites them with a public IP and port. Charlie sees this “outside” address and does the same for Bob.
- Charlie swaps endpoints. It tells Alice Bob’s public IP:port, and tells Bob Alice’s.
- Both sides simultaneously initiate a TCP connection to each other (See “Figure 7: Simultaneous Connection Synchronization” in RFC 9293) using the same local port they used to talk to Charlie (this requires
SO_REUSEADDR/SO_REUSEPORTon the socket). Each side sends a SYN to the other’s public endpoint. - The outbound SYNs punch holes. When Alice’s SYN leaves her NAT heading for Bob, her NAT creates a mapping: “traffic coming back from Bob’s address is allowed in.” Bob’s NAT does the same in reverse.
- The SYNs cross in flight. Now each NAT has a hole that permits the other side’s SYN through. The two stacks see this as a “simultaneous open” and complete the handshake directly - no server in the middle.
The tricky part is that TCP is stateful and timing-sensitive, so it’s flakier than UDP hole punching.
Additionally, it only works for certain types of NAT: RFC 4787 defines three NAT address and port mapping behaviors that are relevant here:
- Endpoint-Independent Mapping (EIM): picks one external (IP, port) for a given internal socket and reuses it regardless of where the packet is going. Once Alice has sent any outbound packet, her external endpoint is fixed and predictable. Hole punching works. Possibly common on consumer residential routers (cable, fibre, DSL).
- Address-Dependent Mapping (ADM): the external port stays the same while Alice is talking to the same destination IP, but a new external port is assigned the first time she contacts a different destination IP. Hole punching is hard. One option is to try to predict ports, but this likely does not work reliably.
- Address+Port-Dependent Mapping (APDM; or symmetric NAT): Every distinct destination (IP, port) gets a fresh external port. We cannot predict what external port Alice’s NAT will assign for any specific peer. Hole punching does not work. Typical of CGNAT deployments, mobile carriers, and restrictive enterprise networks.
Usage in the Bitcoin protocol
Two approaches were discussed initially, but I feel like the design space is not fully explored yet.
-
Rendevouz node: The home-nodes Alice and Bob both want to offer connection slots, but aren’t reachable from the internet. Both happen to pick node Charlie offering a rendezvous/matchmaking service flag and connect to it. Alice might connect first, and needs to wait until Bob connects too. Once both are connected, Charlie then tells them the respective other IP (step 3.). and that they should try to holepunch now. The connection to Charlie now stops. Alice and Bob try for a while to connect to each other.
-
Inbound handoff instead of eviction: An alternive approach is that both Alice and Bob connect to Charlie, but since Charlies inbound slots are full, it tells them it’s going to hand them off to each other and the connection stops. Currently, when inbound slots are full and peers are evicted, we close the connection. Here, we offer them a way to retain a connection, but with a different node.
In both of these approaches, Charlie learns that Alice and Bob might now be connected, but not for how long. Alice learns that Bob and Charlie were connected. Bob learns that Alice and Charlie were connected. Both don’t know how long they were connected for.
There also was brief discussion on how to make this more private by not requiring a coordnator Carol. Can Alice and Bob, once they’ve figured out their public IP:port behind a EIM NAT, use a e.g. Tor connection to communicate their IP:port to establish a clearnet connection. A protocol without a coordinator makes it a lot easier to reason about the connection either being inbound or outbound, and is likely a lot easier to implement.
Implementation in Bitcoin Core?
Bitcoin Core currently uses the notion of “inbound” and “outbound” connections. What would hole-punched connections be? Outbounds are generally more trusted since we choose them, relay transactions faster, etc. Additionaly, there’s the concept of “outbund-full-relay” and “outbound-block-only” connections.
The connections intially were two “outbound”'s to Charlie. But since Alice didn’t choose the connection to Bob from her addrman (Charlie did while matchmaking), this is not a “we-picked-this-address-from-our-addrman” outbound. So would they be inbound-inbound? We probably have to rethink (and refactor) how we think about connections a bit.
Additional, e.g. BIP-324 has the concept of a Initiator and Responder for encrypted transport. Charlie can assign this to one of the sides. There might be other places where this is useful.
Open Questions
- How well does TCP hole punching work in practice? Are there any stats on this by people using it? Does it work with most home routers? How common are EIM NATs and A(P)DM NATs?
- Is there theoretical/academic research or P2P projects using this in the wild that we can learn from? Can we come into contact with some them?
- When using a coordinator, what happens to Alice if either Bob or Charlie are malicious? What attacks can be done? This likely opens opportunities for sybils or ecplise attacks. How dangerous are these?
- Does having “holepunch connections” really address problems we have today? Does it increase the connection slots the network and allows for more peer diversity for unreachable nodes? Does it offer better node inter-connectivity and increases resistance against partition and eclipse attacks? Does it allow for shorter relay paths between unreachable nodes (and is this something we want to optimize for)?
- How much new and changed code is needs to implement this? It might change a lot of our current reasoning and understanding of P2P connections. How do we refactor the existing code for this?
Some resources
- Paper: Communication Across Network Address Translators: https://pdos.csail.mit.edu/papers/p2pnat.pdf
- TCP hole punching - Wikipedia
- libp2p hole-punching Hole Punching | libp2p and dcutr (matchmaking via a relay, not coordinator) DCUtR | libp2p
- Paper: NAT Hole Punching Revisited: https://kops.uni-konstanz.de/server/api/core/bitstreams/29a35a1d-40f1-4290-9d03-dae21f2b9c36/content

