I’m syncing a bitcoind node on my macbook and I can see in Wireshark that it is both fragmenting and reassembling packets greater than 1500 bytes (specifically headers
messages). You can run ifconfig
or similar on your machine and it will tell you MTU. Packets larger than 1500 bytes can be transmitted, but I believe this requires every router to handle this. I believe the 1500 byte limitation is a legacy thing and may vary with OS but seems to be pretty consistent from what I’ve seen. Hope I’m not link spamming too much but this post gives some history into the 1500 byte limitation (How 1500 bytes became the MTU of the internet).
RFC8900:
- This was written recently (in 2020) and describes all of the different issues with fragmentation and reassembly of IP packets.
- Some senders use something called Path MTU Discovery where ICMP packets are sent back to the sender so they can update their MTU estimate for a path. Usage of ICMP is not great because there is no authentication, can be rate-limited, black-holed, etc and I believe this means that in adversarial cases or even during regular operation, the sender may have to retry the send.
- IPv6 has different fragmentation rules than IPv4 which seems to have some upsides but also may introduce some complications. It is less vulnerable to 3rd party IP reassembly attacks.
- It notes that RFC 4443 recommends strict rate limiting of ICMPv6 traffic which may come into play during congestion.
- Ultimately recommends that higher-layer protocols not rely on IP fragmentation as it’s fragile.
RFC4963:
- This was written in 2007 and describes how IP reassembly works.
- IPv4 uses a 16-bit ID field. The implementation “assembling fragments judges fragments to belong to the same datagram if they have the same source, destination, protocol, and Identifier”. In the RFC, it gives an example time that the packet can be alive as 30 seconds. I’m not sure whether this is a TCP Maximum Segment Lifetime (MSL) value (depends on OS, defaults to 30 seconds in Linux) or an IP-related timeout. This has implications on a senders data rate as technically only 65,535 1500-byte packets are valid in a 30-second window or whatever the time limit is.
- IPv4 receivers store fragments in a reassembly buffer until all fragments are received or a reassembly timeout is reached. Configuring the reassembly timeout to be less has issues for slow senders but is better for fast senders. The opposite is also true when increasing the reassembly timeout.
- The RFC describes a situation that can occur either maliciously or under high data-rates called “mis-association”. This is where overlapping, unrelated IP packets are spliced together and then passed to the higher layer. Typically this will get caught by the TCP or UDP checksum, however it’s only a 16-bit checksum and can occasionally be bypassed. Because of this, the RFC ultimately recommends the application layer to implement cryptographic integrity checks (which we do thankfully in both Bitcoin and Lightning).
- Over UDP with 10TB of “random” data being sent, there were 8,847,668 UDP checksum errors and 121 corruptions due to mis-associated fragments (i.e. the UDP checksum was bypassed) and passed to the higher-layer.
- From what I can tell (I have yet to test this), just because we have integrity checks in both Bitcoin and Lightning doesn’t preclude an attacker from messing with our reassembly and causing delays even if they are not an AS and are just guessing two people are connected. The LN graph is public also which is a bit concerning.
Data is pretty hard to come by. I think testing on mainnet and observing traffic is probably your best bet. I think fragmentation can be pretty costly in the presence of errors since retransmission and reassembly has to occur again. But again I don’t have hard data for this. It would be very interesting to see what other applications like Tor or something do when trying to send or receive large amounts of data at once.