Latency and Privacy in Lightning

I was ultimately nerd sniped last LN spec meeting’s discussion [0] of the privacy impact of surfacing granular HTLC hold times via attributable failures [1]. This post contains a recap of the discussion (as I understand it) and a summary of my sniping.

Recap of meeting discussion:

  • The current version of the spec allows forwarding nodes to specify the time they held the HTLC in ms.
  • It’s likely that sending nodes will use this value in the future to pick low latency routes.
  • Adding a random forwarding delay ((2], [3]) improves payment privacy.
  • Surfacing hold times may dis-incentive this privacy-preserving delay as nodes race to the bottom to be the fastest.

The solution suggested in the meeting was to change the encoding to represent blocks time instead, so that the smallest encodable value still leaves time for processing and a random delay. This can’t be done by keeping ms encoding and enforcing some minimum, because nodes can always report smaller values; by changing the encoding, communicating a value under the smallest block of time becomes impractical [4].

Some questions that came up in the meeting:

  • What value should we set this minimum to?
  • How should we manage the UX/privacy tradeoff of fast payments vs forwarding delays?
  • What happens if we need to increase forwarding delays in future?

Understanding Forwarding Delays + Privacy

To understand how these forwarding delays impact payment privacy, I took a look at a few research papers on the subject - summarized below. Of course, any inaccuracies are my own, I’d really recommend reading the papers to form your own opinion.

We are concerned about two different types of attackers:

  1. On path: attacker creates channels, routes payments and attempted to deanonymize them.
  2. Off path: attacker controls an AS, and is able to monitor messages at a network level

On Path Adversary

As outlined in [5]:

  • Attacker probes the network to get latency estimates for nodes.
  • Attacker opens up low-fee and low-expiry channels to attract channels.
  • Recipient identification:
    • Record the time between update_add_htlc and update_fulfill_htlc
    • Compare to latency estimates to calculate number of hops the HTLC took.
  • Sender identification:
    • Only works if the sender retries along the same path.
    • Fail the first HTLC seen, and record time between update_fail_htlc and replacement update_add_htlc.
  • Use amount and CLTV of HTLC to reduce set of possible senders/receivers.
  • Use latency estimates to identify possible paths based on recorded time.

A random forwarding delay is helpful here because it interferes with the ability of the attacker to compare the time they’ve recorded with their latency estimates. In lay-carla’s terms (give or take some noise), the delay is successful if it equals at least the processing time of a single hop, because this means that the attacker will be off by one hop and fail to identify the sender/receiver.

Off Path Adversary

As outlined in [6]:

  • Attacker ICMP pings nodes in the network to get latency estimate.
  • Attacker controls an AS and passively monitors network traffic.
  • The commitment dance for a channel can be identified by message size and direction.
  • With knowledge of the LN graph, an adversary can construct “partial paths” by tracing flow of update_add_htlc messages through the channels they observe.
    • This is timestamp based: if an incoming and outgoing update_add_htlc are processed within the estimated latency, they are assumed to be part of a partial path.
  • Set limits for the possible payment amounts:
    • Minimum: largest htlc_minimum_msat along the partial path (can’t be smaller than the biggest minimum).
    • Maximum: smallest htlc_maximum_msat or capacity along the partial path (can’t be bigger than the smallest channel).
  • Perform a binary search to get a payment amount range:
    • Find the path from first to last node in the partial path for an amount.
    • If the computed path differs from the partial path, the amount is discarded.
  • Remove channels that can’t support the estimated payment amount.
  • Identify sender and receiver:
    • Nodes that remain connected to the first/last hop in the partial path are candidate sender/receivers
    • Check payment path between each possible pair for the payment amount.
    • If the path uses the partial path, then the pair is a possible sender/receiver.

A forwarding delay is helpful here because it interferes with the ability of the attacker to construct partial paths. Notably, once these paths are constructed the attacker still has a large anonymity set to deal with, and the attack relies heavily on deterministic pathfinding at several stages to reduce this set.

[7] also examines how a malicious AS can identify nodes roles in a route with the goal of selective censorship:

  • Senders: update_add_htlc messages sent “out of the blue” indicate that the node is the original sender.
  • Intermediaries: timing analysis is used to connect an incoming revoke_and_ack with an outgoing update_add_htlc to identify forwarding nodes.
  • Recipient: sending a update_fulfill_htlc message after receiving a revoke_and_ack message identifies the recipient, independent of timing.

Note that senders and receivers are identified based on the size of messages, without needing to rely on any timing information. Here, a forwarding delay isn’t helping sender/receiver privacy at all - per the suggestions in the paper, it seems like message padding and possibly cover traffic are the most promising defenses.

Incentives

While reading through all of this, it stood out to me that we’re relying on forwarding nodes to preserve the privacy of senders and receivers. This doesn’t seem particularly incentive aligned. Attributable failures and hold times aside, a profit driven node is incentivized to clear out payments as fast as it can to make efficient use of its capital. This seems sub-optimal on both ends:

  • Senders and receivers who care about privacy can’t hold forwarding nodes accountable for adding a delay, because these values must be random to be effective. If you see that nobody delayed your payment, it may have just happened to get a very low delay on each hop.
  • Forwarding nodes don’t know how long a HTLC’s payment route is, so they can’t easily pick a good delay time that they’re certain will help with privacy (unless they over-estimate, adding an additional hop’s latency) [8].

Is there something better that we can do?

On Path Adversary

In this attack, the attacker depends on the time between update_add_htlc and update_fulfill_htlc to make inferences about the number of hops between itself and the recipient to deanonymize the recipient. It doesn’t matter where the delay happens, just that there is enough delay for it to be ambiguous to the attacker how many hops there are to the recipient. It seems reasonable that we could implement delays on the recipient, instead of with the forwarding nodes. This puts the decision in the hands of the party whose privacy is actually impacted. It also works reasonably well with other hold-time aware systems like jamming mitigations and latency-aware routing, because we have to accommodate the MPP case where the recipient can hold HTLCs anyway.

For sender de-anonymization, the attacker needs to fail a payment and be on-path for the retry. This is more trivially addressable by adding a cool down between attempts and using more diverse retry paths. This is within the control of the sender, so it is nicely incentive aligned.

Off Path Adversary

While timing information is used in this attack, my impression from [6] was that predictable routing algorithms are what makes reducing the anonymity set feasible for the attacking node. This is again a level that we could provide the sender to toggle as they see fit rather than relying on forwarding nodes. Without the ability to prune the network, the anonymity set for this attack remains infeasibly large.

This attack also gets significantly easier for larger payments, as the attacker can prune more channels (that wouldn’t be able to facilitate the amount). So more aggressive payment splitting is another option for privacy conscious sending nodes that does not rely on forwarding nodes for protection.

What to do for attributable failures?

Practically in today’s network, we don’t have any privacy preserving forwarding delays deployed:

  • LND (80-90% of public network): has a 50ms commitment ticker to batch updates, but it is not randomized so can trivially be accounted for in the attacks listed above [9].
  • Eclair (major router): does not implement forward delays.

So we do not currently have any defenses against the above listed attacks implemented. And we should fix that!

My opinion is:

If we truly believe that forwarding delays are the best mitigation:

  • We should all implement and deploy them.
  • We should change encoding in attributable failures hold times to enforce minimum value.

If that’s not the case (which I don’t necessarily think it is):

  • We should investigate and implement some of the suggestions listed above.
  • It’s fine to leave the attributable failures hold times encoded with millisecond granularity.

Footnotes

[0] Lightning Specification Meeting 2025/05/19 · Issue #1258 · lightning/bolts · GitHub

[1] Attributable failures (feature 36/37) by joostjager · Pull Request #1044 · lightning/bolts · GitHub

[2] bolts/04-onion-routing.md at 011bf84d74d130c2972becca97c87f297b9d4a92 · lightning/bolts · GitHub

[3] bolts/02-peer-protocol.md at 011bf84d74d130c2972becca97c87f297b9d4a92 · lightning/bolts · GitHub

[4] Forwarding nodes could flip a high bit to indicate that they’re using ms, but this would require sender cooperation and lead to devastating penalization if senders aren’t modified (because it would lead to their hold time being interpreted as massive).

[5] https://arxiv.org/pdf/2006.12143

[6] Revelio: A Network-Level Privacy Attack in the Lightning Network | IEEE Conference Publication | IEEE Xplore

[7] https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AFT.2024.12

[8] Yes, we may very well have an incredibly privacy conscious and altruistic routing layer. Even if that’s the case (quite probably, since there isn’t much money to be made with it), we shouldn’t be relying on it to make privacy promises.

[9] Heavily emphasized across all papers is that this delay needs to be random to be impactful.

3 Likes

Thank you for taking the time to summarize this discussion.

As you mention, there is a fundamental trade-off between performance (latency) and privacy. While there may be a slight privacy improvement, I’m more concerned that this amplifies performance issues that already exist in Lightning today. Adding a forwarding delay effects every hop of every payment attempt. Not only does this slow the delivery of successful payments by delay * hop_count, but perhaps more concerning is it also delays failing payment attempts, both legitimate and malicious. As routing relies on trial-and-error, failed payments are expected. Worse yet, probing is a common trick to improve future reliability, which means we can expect the number of failed payments to grow exponentially with the number of nodes on the network. Adding delays to these failed attempts compounds the problem of locked liquidity and HTLC slots for routing nodes.

So I agree, routing nodes have no incentive to follow these rules, other than as an optional feature to attract privacy-focused nodes.

I like your idea of applying delays at the source and destinations:

  1. It’s opt-in and can be tuned to the user’s preference
  2. It’s doesn’t require protocol changes
  3. For the receiver, it’s safe to consider the payment successful, even while delaying HTLC fulfillment
1 Like

Thanks for the detailed post and the insights! It does make a lot of sense: I was personally mostly worried about the AS case, where it’s currently somewhat simple to match incoming update_add_htlc with the corresponding outgoing update_add_htlc based on timing and message identification. But as you mention, having cover traffic and padding messages to be indistinguishable by just looking at their size is probably a better (and more general) solution than delays for this kind of adversary.

We’ve known that padding messages was something we needed to do for a long time, and it became particularly useful since we introduced the path_key TLV to update_add_htlc messages, making them distinguishable from update_add_htlcs outside of blinded paths. The downside is of course that it uses more bandwidth, but we can’t have our cake and eat it too. The 65kB limit ensures that we’re still within a single TCP packet, which hopefully shouldn’d degrade performance too much. It would be a loss though if padding all messages to 65kB would actually degrade performance more than delaying HTLC messages! It could be interesting to do some simulations on real nodes (by turning on and off the message padding feature for various time periods) to figure this out.

Let’s see what others think after reading your analysis, but to me it’s a good enough argument to keep reporting the exact hold time in attributable failures.

Last month I had a discussion about this what a few people. Somebody pointed out that we deployed “HTTPS Everywhere” to improve the privacy of everyone on the web.

My counterpoint at the time was that “HTTPS Everywhere” could be imposed by user-agents and their operators, but there is nothing that would force forwarding nodes to create randomized forwarding times; senders and receivers cannot force forwarding nodes to perform the randomization. This is equivalent to the observation by carla that it is the senders and receivers who have an incentive to randomize, not forwarding nodes.

My counterproposal was:

  • Make batching of HTLCs the primitive, not individual update_add_htlcs.
  • Create a new forwarding “receiver-enforced forwarding randomization” protocol:
    • New message you_have_incoming_htlcs. This is sent if a node wants to eventually update_add_htlc one or more HTLCs. The message has no body, and is replayed on reconnection.
    • New response gimme_the_incoming_htlcs. This is sent after receiving you_have_incoming_htlcs.
    • New rules for update_add_htlc:
      • it is an error for a node to send update_add_htlc unless it has received gimme_the_incoming_htlcs. (because it is an error, you should error if you receive an update_add_htlc without having sent gimme_the_incoming_htlcs first and drop all channels with that peer onchain)
      • A “batch” of update_add_htlcs MUST be sent in response to gimme_the_incoming_htlcs. The batch is ended by a commitment_signed. After sending commitment_signed, it is once again an error for the node to send update_add_htlc until it has received a new gimme_the_incoming_htlcs.

The above adds increased latency to the forwarding protocol, due to the additional you_have_incoming_htlcs/gimme_the_incoming_htlcs exchange. A counter to this is that this protocol can be restricted to use only on endpoint receivers (i.e. receivers can use an even feature bit to enforce that this protocol is used in an “HTTPS Everywhere”-style campaign, while forwarders can provide an odd feature bit to indicate to new peers that they support this protocol, and if both of you use the odd feature bit you don’t follow this protocol after all), and pure forwarders can use the original low-latency forwarding protocol with each other.

A receiver can, on receiving a you_have_incoming_htlcs message, then randomize the delay before sending gimme_the_incoming_htlcs. This also allows the LSP of the receiver to batch multiple HTLCs to the receiver (e.g. probably helpful to improve throughput for multipath payments, which carla also noted would probably also help privacy in practice).

Thank you Carla for this great write-up of the discussion! I agree with your analysis in general, here are just a few points I want to add:

  • Yes, adding forwarding delays could be perceived to be unaligned with forwarding nodes’ incentives, however, they already do this and also gain some efficiency and performance from batching HTLCs. Of course, there is a latency trade-off here, but generally it holds that the longer you wait, the higher are the chances that you can benefit from the reduced IO and network-latency overhead of batching HTLCs. IIUC, this would become even more relevant if we were to implement option_simplified_update in the future.
  • As always, “privacy loves company”. So requiring individual nodes who think they need additional privacy protections to add random delays might help with the particular on-path adversary model in mind, but it could actually have them stand out more in the general case. I.e., depending on the adversary it could even put a crosshair on their back, at least if it doesn’t become a best practice to add reasonable random delays before claiming (receiver-side) / retrying (sender-side) payments. So, if we agree sender/receiver side delays are the way to go, it would make sense to actually document best-practices that any implementation should stick to, just as we already do for the CLTV expiry delta in the BOLT #7 “Recommendations for Routing” section.
  • Note that in our research paper ([5]), we still assumed a purely BOLT11 world in which sender’s receiver anonymity was non-existing, i.e., the sender would always know the full path and hence the identity of the receiver anyways. However, in a post-BOLT12/blinded path world, the receiver’s identity can be actually hidden from the sender, and now the sender could be considered an on-path adversary. If we now report exact hold times of each intermediate hop to the sender, it might allow them to re-identify the receiver, breaking the BOLT12 privacy gains we just finally introduced. But of course, if we consider this case, we’d also need to think about mitigations for the onion message parts of the protocol.

TLDR: Yes, I agree that receiver/sender-side delays could be an option, if they were documented as (~binding) best practices for implementations. That’s mod the concerns regarding breaking blinded path privacy.

3 Likes

Thanks for the write-up!

Attributable Failures

I think changing the attr failure encoding to enforce hold_time related attributes isn’t that absolute, routing nodes could manipulate the “protected” encoding to signal lower delays, e.g if we used uint8 hold times then 10001000 could be some slang for 16ms, and this could break a theoretical floor of 100ms. The sender of the payment also has to opt-in to this custom value interpretation.

We need to keep in mind that the reporting of the hold times as part of the attributable failures upgrade was just a placeholder that can prove useful in the future. It’s not precise and certainly not reliable. Routing nodes can choose to lie or trim their hold times to make themselves look more attractive, and this inaccuracy would definitely be factored into the sender’s pathfinding/scoring algorithm.

Seems as if we rushed ahead to assume that hold times are going to be the primary attribute to score a node by? This is definitely not covered by attr failure spec and I’m not sure if any discussion has started around how the values would be incorporated into pathfinding feedback.

We could have senders interpret all values below a threshold as if they were the same, so 87ms / 42ms / 99ms would all be considered as 100ms / 100ms / 100ms. Routing nodes are free to race to the bottom, but for the majority of the network which defaults to the above behavior it wouldn’t make a difference.

Off path adversary

Doing the LND-style commitment batching (maybe with greater & randomized intervals?) is attractive, but would definitely contribute towards slower payment attempts.

Since the cost of having timing related defenses is equally paid by payment senders, it’s wiser to focus on the data/traffic obfuscation vector. Cover traffic sounds very promising, and can definitely help with muddying the waters for the adversary. This could also be an opt-in feature, controllable by the sender.

A sender-controlled approach could be having a mock path which doesn’t change the commitment tx and follows an onion route which only triggers a mock_add_htlc message. This way for every real payment there would be X “mock payments” travelling over a somewhat related route, solely for the purpose of misleading the network-level adversary. A node receiving a mock_add_htlc knows that the only purpose of the message is to forward it to another peer.

Another simple approach could be having empty mock_add_htlc messages (no onion layers of any kind) with a TTL field, that nodes along the real payment path optimistically transmit (random receiver and delay included). A node receiving the mock message forwards to another random peer if TTL>0, decreasing the TTL by 1. The longest possible route here could also be a mock_add_htlc chain triggered by the last hop before the receiver.

All mock/obfuscation related messages should of course have their own processing budget and not interfere with channel related messages that are of higher priority.

On path adversary

I don’t have much to add here, the sender/receiver controlled delays seem to be a very nice angle to tackle the issue.

1 Like

If we now report exact hold times of each intermediate hop to the sender, it might allow them to re-identify the receiver, breaking the BOLT12 privacy gains we just finally introduced. But of course, if we consider this case, we’d also need to think about mitigations for the onion message parts of the protocol.

For the blinded part of the path we don’t have to report hold times (the forwarding node knows if it’s part of one). Also the sender does not know which nodes make up the blinded path, so cannot assign blame / penalties anyway.

Yes, indeed I just discussed this offline with Joost. Here are a few conclusions as a follow-up:

  • My concerns regarding BOLT12 privacy are indeed invalid as the introduction point would strip the attribution data. This in turn means that the next node upstream would report a huge latency measurement (as it would cover the entire blinded path’s latency), which is of course bogus and would need to be disregarded during scoring.
  • Similarly, any trampoline or legacy node in the path would also lead to stripped attribution data, meaning we’d only receive attribution data for hops before we encounter a blinded path, trampoline, legacy node in the path.
  • As the attribution happens on a per-edge basis there is no way to discern the second-to-last hop and the final recipient. This means that if we want to exempt the recipient from the blame to incentivize receiver-side delays, it would always need be the last two nodes on the path. This essentially also results in a similar rule for the sender-side scoring to disregard/throw away whatever the last attribution data entry it receives for any given the path.

Especially given that BOLT12/blinded paths (maybe even 2-hop?) might eventually become the default payment protocol, it seems hold time reporting will be limited to a prefix of any given path either way. This limits its usefulness, but also the impact it might have on privacy.

So my personal conclusion is that we might be fine with hold times reporting, as long as we establish best practices around receiver-side delays and their exemption from sender-side scoring as mentioned above.

Indeed. I think being able to toggle these delays to decide on your IO/latency tradeoff makes a lot of sense :+1: afaik LND already allows this.

Seems like a reasonable idea to me - and something that could be turned on by default to make sure that privacy gets some company!

Agree. Timing based scoring will already need to take into account the MPP timeout, so would already need a code path to allow this (just needs to be turned on for single-HTLC payments as well).

I think I’m missing the concern for BOLT 12 a bit here - could you spell it out for me why the introduction point needs to strip attribution data?

I would have thought that the receiving node would just add its own delay (possibly breaking this delay up between any fake hops it added to the blinded route) and then report them back?

I was looking into this recently to jog my memory – TCP packets are fragmented based on the path’s minimum MTU (PMTU) and then reassembly of TCP packets occurs. This means in practice TCP packets are limited to ~1500 bytes. See RFC 8900 - IP Fragmentation Considered Fragile for more information if you have some time. That RFC links to another RFC (RFC 4963 - IPv4 Reassembly Errors at High Data Rates) that describes an IPv4 fragmentation attack where a 3rd-party can spoof the 16-bit ID counter in the IP header and cause IP reassembly to fail (when validating the checksum) or pass with corrupted data (randomly passes the checksum which is also 16-bit).

If this is just a typo and you meant Lightning packet then please disregard. I am not sure of the overhead of reassembly of TCP packets, but from what I’ve read fragmentation seems to have some issues. Mistakes my own.

1 Like

Now opened Add recommendations for receiver-side random delays by tnull · Pull Request #1263 · lightning/bolts · GitHub to propose adding such recommendations to the BOLTs.

It is my understanding that this is this is the currently proposed way how attributable failures would work in conjunction with blinded paths. Essentially, the sender can’t learn anything about the blinded path, in particular not it length, let alone timing measurements per hop. For the sender the blinded path that it takes from the offer is opaque and acts it terms of scoring like a single monolithic hop itself. Maybe this should be made more explicit in https://github.com/lightning/bolts/pull/1044 (cc @joostjager)?

@Crypt-iQ you’re completely right, thanks for highlighting this: I did mean TCP packet and assumed the best-case where it isn’t fragmented by intermediate routers. But it was part of my open-ended question about possible performance degradation: I have no idea whether in practice 65kB TCP packets often get fragmented or not, how much overhead it adds, and couldn’t find public data about this…if a lot of fragmentation happens, then padding could indeed lead to degraded performance (on top of the additional bandwidth usage increase).

Is this a default OS behavior? Or is it just a recommendation? I’ll read the links you provided when I have some time, but would love a TL;DR for now :slight_smile:

I have no idea where to find actual data of what happens on the internet nowadays. I was thinking that we could do A/B testing on mainnet nodes: pad all packets to 65kB for a few days, then remove padding for a few days, and repeat, while measuring latency. This could give some indication of the overhead, even though it wouldn’t let us accurately predict higher percentiles since there probably isn’t enough payment volume today to provide meaningful statistics, but it’s a start.

I’m syncing a bitcoind node on my macbook and I can see in Wireshark that it is both fragmenting and reassembling packets greater than 1500 bytes (specifically headers messages). You can run ifconfig or similar on your machine and it will tell you MTU. Packets larger than 1500 bytes can be transmitted, but I believe this requires every router to handle this. I believe the 1500 byte limitation is a legacy thing and may vary with OS but seems to be pretty consistent from what I’ve seen. Hope I’m not link spamming too much but this post gives some history into the 1500 byte limitation (How 1500 bytes became the MTU of the internet).

RFC8900:

  • This was written recently (in 2020) and describes all of the different issues with fragmentation and reassembly of IP packets.
    • Some senders use something called Path MTU Discovery where ICMP packets are sent back to the sender so they can update their MTU estimate for a path. Usage of ICMP is not great because there is no authentication, can be rate-limited, black-holed, etc and I believe this means that in adversarial cases or even during regular operation, the sender may have to retry the send.
    • IPv6 has different fragmentation rules than IPv4 which seems to have some upsides but also may introduce some complications. It is less vulnerable to 3rd party IP reassembly attacks.
    • It notes that RFC 4443 recommends strict rate limiting of ICMPv6 traffic which may come into play during congestion.
    • Ultimately recommends that higher-layer protocols not rely on IP fragmentation as it’s fragile.

RFC4963:

  • This was written in 2007 and describes how IP reassembly works.
    • IPv4 uses a 16-bit ID field. The implementation “assembling fragments judges fragments to belong to the same datagram if they have the same source, destination, protocol, and Identifier”. In the RFC, it gives an example time that the packet can be alive as 30 seconds. I’m not sure whether this is a TCP Maximum Segment Lifetime (MSL) value (depends on OS, defaults to 30 seconds in Linux) or an IP-related timeout. This has implications on a senders data rate as technically only 65,535 1500-byte packets are valid in a 30-second window or whatever the time limit is.
    • IPv4 receivers store fragments in a reassembly buffer until all fragments are received or a reassembly timeout is reached. Configuring the reassembly timeout to be less has issues for slow senders but is better for fast senders. The opposite is also true when increasing the reassembly timeout.
    • The RFC describes a situation that can occur either maliciously or under high data-rates called “mis-association”. This is where overlapping, unrelated IP packets are spliced together and then passed to the higher layer. Typically this will get caught by the TCP or UDP checksum, however it’s only a 16-bit checksum and can occasionally be bypassed. Because of this, the RFC ultimately recommends the application layer to implement cryptographic integrity checks (which we do thankfully in both Bitcoin and Lightning).
    • Over UDP with 10TB of “random” data being sent, there were 8,847,668 UDP checksum errors and 121 corruptions due to mis-associated fragments (i.e. the UDP checksum was bypassed) and passed to the higher-layer.
    • From what I can tell (I have yet to test this), just because we have integrity checks in both Bitcoin and Lightning doesn’t preclude an attacker from messing with our reassembly and causing delays even if they are not an AS and are just guessing two people are connected. The LN graph is public also which is a bit concerning.

Data is pretty hard to come by. I think testing on mainnet and observing traffic is probably your best bet. I think fragmentation can be pretty costly in the presence of errors since retransmission and reassembly has to occur again. But again I don’t have hard data for this. It would be very interesting to see what other applications like Tor or something do when trying to send or receive large amounts of data at once.

2 Likes

Might be worth noting that the actual payload size that can be transmitted without fragmentation is more like 1400 bytes to 1460 bytes (1500 B - 20B (IPv4 header) or 40B (IPv6 header) - 20 to 60 B (TCP header)).

2 Likes

Thanks for the additional details, that’s really helpful! I remember now that this is why onion packets have been chosen to be 1300 bytes, so that an update_add_htlc would fit inside the 1500 bytes MTU.

So the conclusion is that we should assume that TCP packets will be fragmented in 1500 bytes chunks, so I’m curious to see the impact of sending 65kB lightning packets, which will require quite a reassembly…It would be interesting to run this in simulations (SimLN can probably help here?) or try it out with A/B testing on mainnet.

Sorry, I think I’ve conflated TCP reassembly with IP reassembly. When the bitcoind node was syncing, I was seeing reassembled TCP segments. I don’t think the IP fragmentation stuff applies here as I think most routers don’t fragment IP packets. However, the limit @tnull pointed out still applies as TCP will break up packets into MTU size chunks to transmit as IP packets.

Added sentence to the bolt spec PR

These two statements appear to contradict each other - the point of the forwarding delay is to create a batch such that a passive network monitor can’t trivially trace payments. Indeed, in some specific attacks in the literature they use message sizes and other issues in LN that we should address separately, but unless we want to switch to full CBR streams (at least for HTLCs themselves, in theory we could segment out our traffic in our TCP streams such that we send exactly one 1400-byte packet every 100ms or whatever even though we let gossip use whatever rate it wants), moving to no forwarding delay would leave payment tracing from someone doing network monitoring absolutely trivial.

Even if we switch to CBR (is there even appetite for doing this?), having no forwarding delays would mean that someone trying to get privacy by adding extra hops would be absolutely wrecked by an adversary running multiple nodes (post-PTLCs).

ISTM delays when forwarding (to the point of getting batching, at least insofar as we have enough traffic that we can get there without killing UX) is very important for privacy, especially as we fix low-hanging fruit like message padding.

Indeed - statements are meant to apply in the context of each described attack, as they’re the two types of attacks I could find in the literature. Just meant to indicate that in [6] timing matters, in [7] it doesn’t.

My understanding of [6] is that just being able to construct these partial paths is insufficient to deanonymize senders and receivers (if we add some privacy-awareness to pathfinding).

Two questions here:

  • Is there an off-path attack which does not require the attacker to run a ton of pathfinding to complete the path?
  • Are you talking about the case where the attacker sees network messages for the full path (they’re all in the same malicious AS group)?

Could you explain this further? Based on the PTLC reference, I assume we’re talking about the on-path attack? It’s unclear to me why a receiver-side delay doesn’t help with a multi-node on-path attacker.

Given many payments are only a few hops, I assume most of them, honestly.

I assume that in many payments an attacker can see network messages for much of the path, or at a minimum the first hop and last hop of a path, which suffices to figure out a payment based on timing (assuming we don’t have any delays and only a moderate flow of payments along the path, which is probably common-ish, at least today, but if you only add one or two intermediate hops I assume its still pretty doable).

My point was that being an on-path attacker is pretty similar in principle to a network attacker who can see only a subset of the hops, but with additional information, that lets you remove some false-positives in the classifier. Obviously pre-PTLCs you just know from the payment hash, but post-PTLCs the amount is pretty valuable, just not perfect - at that point we really want some delays so that the amount stays a bad classifier rather than a perfect one when combined with time.