Latency and Privacy in Lightning

I was ultimately nerd sniped last LN spec meeting’s discussion [0] of the privacy impact of surfacing granular HTLC hold times via attributable failures [1]. This post contains a recap of the discussion (as I understand it) and a summary of my sniping.

Recap of meeting discussion:

  • The current version of the spec allows forwarding nodes to specify the time they held the HTLC in ms.
  • It’s likely that sending nodes will use this value in the future to pick low latency routes.
  • Adding a random forwarding delay ((2], [3]) improves payment privacy.
  • Surfacing hold times may dis-incentive this privacy-preserving delay as nodes race to the bottom to be the fastest.

The solution suggested in the meeting was to change the encoding to represent blocks time instead, so that the smallest encodable value still leaves time for processing and a random delay. This can’t be done by keeping ms encoding and enforcing some minimum, because nodes can always report smaller values; by changing the encoding, communicating a value under the smallest block of time becomes impractical [4].

Some questions that came up in the meeting:

  • What value should we set this minimum to?
  • How should we manage the UX/privacy tradeoff of fast payments vs forwarding delays?
  • What happens if we need to increase forwarding delays in future?

Understanding Forwarding Delays + Privacy

To understand how these forwarding delays impact payment privacy, I took a look at a few research papers on the subject - summarized below. Of course, any inaccuracies are my own, I’d really recommend reading the papers to form your own opinion.

We are concerned about two different types of attackers:

  1. On path: attacker creates channels, routes payments and attempted to deanonymize them.
  2. Off path: attacker controls an AS, and is able to monitor messages at a network level

On Path Adversary

As outlined in [5]:

  • Attacker probes the network to get latency estimates for nodes.
  • Attacker opens up low-fee and low-expiry channels to attract channels.
  • Recipient identification:
    • Record the time between update_add_htlc and update_fulfill_htlc
    • Compare to latency estimates to calculate number of hops the HTLC took.
  • Sender identification:
    • Only works if the sender retries along the same path.
    • Fail the first HTLC seen, and record time between update_fail_htlc and replacement update_add_htlc.
  • Use amount and CLTV of HTLC to reduce set of possible senders/receivers.
  • Use latency estimates to identify possible paths based on recorded time.

A random forwarding delay is helpful here because it interferes with the ability of the attacker to compare the time they’ve recorded with their latency estimates. In lay-carla’s terms (give or take some noise), the delay is successful if it equals at least the processing time of a single hop, because this means that the attacker will be off by one hop and fail to identify the sender/receiver.

Off Path Adversary

As outlined in [6]:

  • Attacker ICMP pings nodes in the network to get latency estimate.
  • Attacker controls an AS and passively monitors network traffic.
  • The commitment dance for a channel can be identified by message size and direction.
  • With knowledge of the LN graph, an adversary can construct “partial paths” by tracing flow of update_add_htlc messages through the channels they observe.
    • This is timestamp based: if an incoming and outgoing update_add_htlc are processed within the estimated latency, they are assumed to be part of a partial path.
  • Set limits for the possible payment amounts:
    • Minimum: largest htlc_minimum_msat along the partial path (can’t be smaller than the biggest minimum).
    • Maximum: smallest htlc_maximum_msat or capacity along the partial path (can’t be bigger than the smallest channel).
  • Perform a binary search to get a payment amount range:
    • Find the path from first to last node in the partial path for an amount.
    • If the computed path differs from the partial path, the amount is discarded.
  • Remove channels that can’t support the estimated payment amount.
  • Identify sender and receiver:
    • Nodes that remain connected to the first/last hop in the partial path are candidate sender/receivers
    • Check payment path between each possible pair for the payment amount.
    • If the path uses the partial path, then the pair is a possible sender/receiver.

A forwarding delay is helpful here because it interferes with the ability of the attacker to construct partial paths. Notably, once these paths are constructed the attacker still has a large anonymity set to deal with, and the attack relies heavily on deterministic pathfinding at several stages to reduce this set.

[7] also examines how a malicious AS can identify nodes roles in a route with the goal of selective censorship:

  • Senders: update_add_htlc messages sent “out of the blue” indicate that the node is the original sender.
  • Intermediaries: timing analysis is used to connect an incoming revoke_and_ack with an outgoing update_add_htlc to identify forwarding nodes.
  • Recipient: sending a update_fulfill_htlc message after receiving a revoke_and_ack message identifies the recipient, independent of timing.

Note that senders and receivers are identified based on the size of messages, without needing to rely on any timing information. Here, a forwarding delay isn’t helping sender/receiver privacy at all - per the suggestions in the paper, it seems like message padding and possibly cover traffic are the most promising defenses.

Incentives

While reading through all of this, it stood out to me that we’re relying on forwarding nodes to preserve the privacy of senders and receivers. This doesn’t seem particularly incentive aligned. Attributable failures and hold times aside, a profit driven node is incentivized to clear out payments as fast as it can to make efficient use of its capital. This seems sub-optimal on both ends:

  • Senders and receivers who care about privacy can’t hold forwarding nodes accountable for adding a delay, because these values must be random to be effective. If you see that nobody delayed your payment, it may have just happened to get a very low delay on each hop.
  • Forwarding nodes don’t know how long a HTLC’s payment route is, so they can’t easily pick a good delay time that they’re certain will help with privacy (unless they over-estimate, adding an additional hop’s latency) [8].

Is there something better that we can do?

On Path Adversary

In this attack, the attacker depends on the time between update_add_htlc and update_fulfill_htlc to make inferences about the number of hops between itself and the recipient to deanonymize the recipient. It doesn’t matter where the delay happens, just that there is enough delay for it to be ambiguous to the attacker how many hops there are to the recipient. It seems reasonable that we could implement delays on the recipient, instead of with the forwarding nodes. This puts the decision in the hands of the party whose privacy is actually impacted. It also works reasonably well with other hold-time aware systems like jamming mitigations and latency-aware routing, because we have to accommodate the MPP case where the recipient can hold HTLCs anyway.

For sender de-anonymization, the attacker needs to fail a payment and be on-path for the retry. This is more trivially addressable by adding a cool down between attempts and using more diverse retry paths. This is within the control of the sender, so it is nicely incentive aligned.

Off Path Adversary

While timing information is used in this attack, my impression from [6] was that predictable routing algorithms are what makes reducing the anonymity set feasible for the attacking node. This is again a level that we could provide the sender to toggle as they see fit rather than relying on forwarding nodes. Without the ability to prune the network, the anonymity set for this attack remains infeasibly large.

This attack also gets significantly easier for larger payments, as the attacker can prune more channels (that wouldn’t be able to facilitate the amount). So more aggressive payment splitting is another option for privacy conscious sending nodes that does not rely on forwarding nodes for protection.

What to do for attributable failures?

Practically in today’s network, we don’t have any privacy preserving forwarding delays deployed:

  • LND (80-90% of public network): has a 50ms commitment ticker to batch updates, but it is not randomized so can trivially be accounted for in the attacks listed above [9].
  • Eclair (major router): does not implement forward delays.

So we do not currently have any defenses against the above listed attacks implemented. And we should fix that!

My opinion is:

If we truly believe that forwarding delays are the best mitigation:

  • We should all implement and deploy them.
  • We should change encoding in attributable failures hold times to enforce minimum value.

If that’s not the case (which I don’t necessarily think it is):

  • We should investigate and implement some of the suggestions listed above.
  • It’s fine to leave the attributable failures hold times encoded with millisecond granularity.

Footnotes

[0] Lightning Specification Meeting 2025/05/19 · Issue #1258 · lightning/bolts · GitHub

[1] Attributable failures (feature 36/37) by joostjager · Pull Request #1044 · lightning/bolts · GitHub

[2] bolts/04-onion-routing.md at 011bf84d74d130c2972becca97c87f297b9d4a92 · lightning/bolts · GitHub

[3] bolts/02-peer-protocol.md at 011bf84d74d130c2972becca97c87f297b9d4a92 · lightning/bolts · GitHub

[4] Forwarding nodes could flip a high bit to indicate that they’re using ms, but this would require sender cooperation and lead to devastating penalization if senders aren’t modified (because it would lead to their hold time being interpreted as massive).

[5] https://arxiv.org/pdf/2006.12143

[6] Revelio: A Network-Level Privacy Attack in the Lightning Network | IEEE Conference Publication | IEEE Xplore

[7] https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AFT.2024.12

[8] Yes, we may very well have an incredibly privacy conscious and altruistic routing layer. Even if that’s the case (quite probably, since there isn’t much money to be made with it), we shouldn’t be relying on it to make privacy promises.

[9] Heavily emphasized across all papers is that this delay needs to be random to be impactful.

2 Likes

Thank you for taking the time to summarize this discussion.

As you mention, there is a fundamental trade-off between performance (latency) and privacy. While there may be a slight privacy improvement, I’m more concerned that this amplifies performance issues that already exist in Lightning today. Adding a forwarding delay effects every hop of every payment attempt. Not only does this slow the delivery of successful payments by delay * hop_count, but perhaps more concerning is it also delays failing payment attempts, both legitimate and malicious. As routing relies on trial-and-error, failed payments are expected. Worse yet, probing is a common trick to improve future reliability, which means we can expect the number of failed payments to grow exponentially with the number of nodes on the network. Adding delays to these failed attempts compounds the problem of locked liquidity and HTLC slots for routing nodes.

So I agree, routing nodes have no incentive to follow these rules, other than as an optional feature to attract privacy-focused nodes.

I like your idea of applying delays at the source and destinations:

  1. It’s opt-in and can be tuned to the user’s preference
  2. It’s doesn’t require protocol changes
  3. For the receiver, it’s safe to consider the payment successful, even while delaying HTLC fulfillment
1 Like

Thanks for the detailed post and the insights! It does make a lot of sense: I was personally mostly worried about the AS case, where it’s currently somewhat simple to match incoming update_add_htlc with the corresponding outgoing update_add_htlc based on timing and message identification. But as you mention, having cover traffic and padding messages to be indistinguishable by just looking at their size is probably a better (and more general) solution than delays for this kind of adversary.

We’ve known that padding messages was something we needed to do for a long time, and it became particularly useful since we introduced the path_key TLV to update_add_htlc messages, making them distinguishable from update_add_htlcs outside of blinded paths. The downside is of course that it uses more bandwidth, but we can’t have our cake and eat it too. The 65kB limit ensures that we’re still within a single TCP packet, which hopefully shouldn’d degrade performance too much. It would be a loss though if padding all messages to 65kB would actually degrade performance more than delaying HTLC messages! It could be interesting to do some simulations on real nodes (by turning on and off the message padding feature for various time periods) to figure this out.

Let’s see what others think after reading your analysis, but to me it’s a good enough argument to keep reporting the exact hold time in attributable failures.

Last month I had a discussion about this what a few people. Somebody pointed out that we deployed “HTTPS Everywhere” to improve the privacy of everyone on the web.

My counterpoint at the time was that “HTTPS Everywhere” could be imposed by user-agents and their operators, but there is nothing that would force forwarding nodes to create randomized forwarding times; senders and receivers cannot force forwarding nodes to perform the randomization. This is equivalent to the observation by carla that it is the senders and receivers who have an incentive to randomize, not forwarding nodes.

My counterproposal was:

  • Make batching of HTLCs the primitive, not individual update_add_htlcs.
  • Create a new forwarding “receiver-enforced forwarding randomization” protocol:
    • New message you_have_incoming_htlcs. This is sent if a node wants to eventually update_add_htlc one or more HTLCs. The message has no body, and is replayed on reconnection.
    • New response gimme_the_incoming_htlcs. This is sent after receiving you_have_incoming_htlcs.
    • New rules for update_add_htlc:
      • it is an error for a node to send update_add_htlc unless it has received gimme_the_incoming_htlcs. (because it is an error, you should error if you receive an update_add_htlc without having sent gimme_the_incoming_htlcs first and drop all channels with that peer onchain)
      • A “batch” of update_add_htlcs MUST be sent in response to gimme_the_incoming_htlcs. The batch is ended by a commitment_signed. After sending commitment_signed, it is once again an error for the node to send update_add_htlc until it has received a new gimme_the_incoming_htlcs.

The above adds increased latency to the forwarding protocol, due to the additional you_have_incoming_htlcs/gimme_the_incoming_htlcs exchange. A counter to this is that this protocol can be restricted to use only on endpoint receivers (i.e. receivers can use an even feature bit to enforce that this protocol is used in an “HTTPS Everywhere”-style campaign, while forwarders can provide an odd feature bit to indicate to new peers that they support this protocol, and if both of you use the odd feature bit you don’t follow this protocol after all), and pure forwarders can use the original low-latency forwarding protocol with each other.

A receiver can, on receiving a you_have_incoming_htlcs message, then randomize the delay before sending gimme_the_incoming_htlcs. This also allows the LSP of the receiver to batch multiple HTLCs to the receiver (e.g. probably helpful to improve throughput for multipath payments, which carla also noted would probably also help privacy in practice).

Thank you Carla for this great write-up of the discussion! I agree with your analysis in general, here are just a few points I want to add:

  • Yes, adding forwarding delays could be perceived to be unaligned with forwarding nodes’ incentives, however, they already do this and also gain some efficiency and performance from batching HTLCs. Of course, there is a latency trade-off here, but generally it holds that the longer you wait, the higher are the chances that you can benefit from the reduced IO and network-latency overhead of batching HTLCs. IIUC, this would become even more relevant if we were to implement option_simplified_update in the future.
  • As always, “privacy loves company”. So requiring individual nodes who think they need additional privacy protections to add random delays might help with the particular on-path adversary model in mind, but it could actually have them stand out more in the general case. I.e., depending on the adversary it could even put a crosshair on their back, at least if it doesn’t become a best practice to add reasonable random delays before claiming (receiver-side) / retrying (sender-side) payments. So, if we agree sender/receiver side delays are the way to go, it would make sense to actually document best-practices that any implementation should stick to, just as we already do for the CLTV expiry delta in the BOLT #7 “Recommendations for Routing” section.
  • Note that in our research paper ([5]), we still assumed a purely BOLT11 world in which sender’s receiver anonymity was non-existing, i.e., the sender would always know the full path and hence the identity of the receiver anyways. However, in a post-BOLT12/blinded path world, the receiver’s identity can be actually hidden from the sender, and now the sender could be considered an on-path adversary. If we now report exact hold times of each intermediate hop to the sender, it might allow them to re-identify the receiver, breaking the BOLT12 privacy gains we just finally introduced. But of course, if we consider this case, we’d also need to think about mitigations for the onion message parts of the protocol.

TLDR: Yes, I agree that receiver/sender-side delays could be an option, if they were documented as (~binding) best practices for implementations. That’s mod the concerns regarding breaking blinded path privacy.

3 Likes

Thanks for the write-up!

Attributable Failures

I think changing the attr failure encoding to enforce hold_time related attributes isn’t that absolute, routing nodes could manipulate the “protected” encoding to signal lower delays, e.g if we used uint8 hold times then 10001000 could be some slang for 16ms, and this could break a theoretical floor of 100ms. The sender of the payment also has to opt-in to this custom value interpretation.

We need to keep in mind that the reporting of the hold times as part of the attributable failures upgrade was just a placeholder that can prove useful in the future. It’s not precise and certainly not reliable. Routing nodes can choose to lie or trim their hold times to make themselves look more attractive, and this inaccuracy would definitely be factored into the sender’s pathfinding/scoring algorithm.

Seems as if we rushed ahead to assume that hold times are going to be the primary attribute to score a node by? This is definitely not covered by attr failure spec and I’m not sure if any discussion has started around how the values would be incorporated into pathfinding feedback.

We could have senders interpret all values below a threshold as if they were the same, so 87ms / 42ms / 99ms would all be considered as 100ms / 100ms / 100ms. Routing nodes are free to race to the bottom, but for the majority of the network which defaults to the above behavior it wouldn’t make a difference.

Off path adversary

Doing the LND-style commitment batching (maybe with greater & randomized intervals?) is attractive, but would definitely contribute towards slower payment attempts.

Since the cost of having timing related defenses is equally paid by payment senders, it’s wiser to focus on the data/traffic obfuscation vector. Cover traffic sounds very promising, and can definitely help with muddying the waters for the adversary. This could also be an opt-in feature, controllable by the sender.

A sender-controlled approach could be having a mock path which doesn’t change the commitment tx and follows an onion route which only triggers a mock_add_htlc message. This way for every real payment there would be X “mock payments” travelling over a somewhat related route, solely for the purpose of misleading the network-level adversary. A node receiving a mock_add_htlc knows that the only purpose of the message is to forward it to another peer.

Another simple approach could be having empty mock_add_htlc messages (no onion layers of any kind) with a TTL field, that nodes along the real payment path optimistically transmit (random receiver and delay included). A node receiving the mock message forwards to another random peer if TTL>0, decreasing the TTL by 1. The longest possible route here could also be a mock_add_htlc chain triggered by the last hop before the receiver.

All mock/obfuscation related messages should of course have their own processing budget and not interfere with channel related messages that are of higher priority.

On path adversary

I don’t have much to add here, the sender/receiver controlled delays seem to be a very nice angle to tackle the issue.

If we now report exact hold times of each intermediate hop to the sender, it might allow them to re-identify the receiver, breaking the BOLT12 privacy gains we just finally introduced. But of course, if we consider this case, we’d also need to think about mitigations for the onion message parts of the protocol.

For the blinded part of the path we don’t have to report hold times (the forwarding node knows if it’s part of one). Also the sender does not know which nodes make up the blinded path, so cannot assign blame / penalties anyway.

Yes, indeed I just discussed this offline with Joost. Here are a few conclusions as a follow-up:

  • My concerns regarding BOLT12 privacy are indeed invalid as the introduction point would strip the attribution data. This in turn means that the next node upstream would report a huge latency measurement (as it would cover the entire blinded path’s latency), which is of course bogus and would need to be disregarded during scoring.
  • Similarly, any trampoline or legacy node in the path would also lead to stripped attribution data, meaning we’d only receive attribution data for hops before we encounter a blinded path, trampoline, legacy node in the path.
  • As the attribution happens on a per-edge basis there is no way to discern the second-to-last hop and the final recipient. This means that if we want to exempt the recipient from the blame to incentivize receiver-side delays, it would always need be the last two nodes on the path. This essentially also results in a similar rule for the sender-side scoring to disregard/throw away whatever the last attribution data entry it receives for any given the path.

Especially given that BOLT12/blinded paths (maybe even 2-hop?) might eventually become the default payment protocol, it seems hold time reporting will be limited to a prefix of any given path either way. This limits its usefulness, but also the impact it might have on privacy.

So my personal conclusion is that we might be fine with hold times reporting, as long as we establish best practices around receiver-side delays and their exemption from sender-side scoring as mentioned above.

Indeed. I think being able to toggle these delays to decide on your IO/latency tradeoff makes a lot of sense :+1: afaik LND already allows this.

Seems like a reasonable idea to me - and something that could be turned on by default to make sure that privacy gets some company!

Agree. Timing based scoring will already need to take into account the MPP timeout, so would already need a code path to allow this (just needs to be turned on for single-HTLC payments as well).

I think I’m missing the concern for BOLT 12 a bit here - could you spell it out for me why the introduction point needs to strip attribution data?

I would have thought that the receiving node would just add its own delay (possibly breaking this delay up between any fake hops it added to the blinded route) and then report them back?