I was ultimately nerd sniped last LN spec meeting’s discussion [0] of the privacy impact of surfacing granular HTLC hold times via attributable failures [1]. This post contains a recap of the discussion (as I understand it) and a summary of my sniping.
Recap of meeting discussion:
- The current version of the spec allows forwarding nodes to specify the time they held the HTLC in ms.
- It’s likely that sending nodes will use this value in the future to pick low latency routes.
- Adding a random forwarding delay ((2], [3]) improves payment privacy.
- Surfacing hold times may dis-incentive this privacy-preserving delay as nodes race to the bottom to be the fastest.
The solution suggested in the meeting was to change the encoding to represent blocks time instead, so that the smallest encodable value still leaves time for processing and a random delay. This can’t be done by keeping ms encoding and enforcing some minimum, because nodes can always report smaller values; by changing the encoding, communicating a value under the smallest block of time becomes impractical [4].
Some questions that came up in the meeting:
- What value should we set this minimum to?
- How should we manage the UX/privacy tradeoff of fast payments vs forwarding delays?
- What happens if we need to increase forwarding delays in future?
Understanding Forwarding Delays + Privacy
To understand how these forwarding delays impact payment privacy, I took a look at a few research papers on the subject - summarized below. Of course, any inaccuracies are my own, I’d really recommend reading the papers to form your own opinion.
We are concerned about two different types of attackers:
- On path: attacker creates channels, routes payments and attempted to deanonymize them.
- Off path: attacker controls an AS, and is able to monitor messages at a network level
On Path Adversary
As outlined in [5]:
- Attacker probes the network to get latency estimates for nodes.
- Attacker opens up low-fee and low-expiry channels to attract channels.
- Recipient identification:
- Record the time between
update_add_htlc
andupdate_fulfill_htlc
- Compare to latency estimates to calculate number of hops the HTLC took.
- Record the time between
- Sender identification:
- Only works if the sender retries along the same path.
- Fail the first HTLC seen, and record time between
update_fail_htlc
and replacementupdate_add_htlc
.
- Use amount and CLTV of HTLC to reduce set of possible senders/receivers.
- Use latency estimates to identify possible paths based on recorded time.
A random forwarding delay is helpful here because it interferes with the ability of the attacker to compare the time they’ve recorded with their latency estimates. In lay-carla’s terms (give or take some noise), the delay is successful if it equals at least the processing time of a single hop, because this means that the attacker will be off by one hop and fail to identify the sender/receiver.
Off Path Adversary
As outlined in [6]:
- Attacker ICMP pings nodes in the network to get latency estimate.
- Attacker controls an AS and passively monitors network traffic.
- The commitment dance for a channel can be identified by message size and direction.
- With knowledge of the LN graph, an adversary can construct “partial paths” by tracing flow of
update_add_htlc
messages through the channels they observe.- This is timestamp based: if an incoming and outgoing
update_add_htlc
are processed within the estimated latency, they are assumed to be part of a partial path.
- This is timestamp based: if an incoming and outgoing
- Set limits for the possible payment amounts:
- Minimum: largest
htlc_minimum_msat
along the partial path (can’t be smaller than the biggest minimum). - Maximum: smallest
htlc_maximum_msat
orcapacity
along the partial path (can’t be bigger than the smallest channel).
- Minimum: largest
- Perform a binary search to get a payment amount range:
- Find the path from first to last node in the partial path for an amount.
- If the computed path differs from the partial path, the amount is discarded.
- Remove channels that can’t support the estimated payment amount.
- Identify sender and receiver:
- Nodes that remain connected to the first/last hop in the partial path are candidate sender/receivers
- Check payment path between each possible pair for the payment amount.
- If the path uses the partial path, then the pair is a possible sender/receiver.
A forwarding delay is helpful here because it interferes with the ability of the attacker to construct partial paths. Notably, once these paths are constructed the attacker still has a large anonymity set to deal with, and the attack relies heavily on deterministic pathfinding at several stages to reduce this set.
[7] also examines how a malicious AS can identify nodes roles in a route with the goal of selective censorship:
- Senders:
update_add_htlc
messages sent “out of the blue” indicate that the node is the original sender. - Intermediaries: timing analysis is used to connect an incoming
revoke_and_ack
with an outgoingupdate_add_htlc
to identify forwarding nodes. - Recipient: sending a
update_fulfill_htlc
message after receiving arevoke_and_ack
message identifies the recipient, independent of timing.
Note that senders and receivers are identified based on the size of messages, without needing to rely on any timing information. Here, a forwarding delay isn’t helping sender/receiver privacy at all - per the suggestions in the paper, it seems like message padding and possibly cover traffic are the most promising defenses.
Incentives
While reading through all of this, it stood out to me that we’re relying on forwarding nodes to preserve the privacy of senders and receivers. This doesn’t seem particularly incentive aligned. Attributable failures and hold times aside, a profit driven node is incentivized to clear out payments as fast as it can to make efficient use of its capital. This seems sub-optimal on both ends:
- Senders and receivers who care about privacy can’t hold forwarding nodes accountable for adding a delay, because these values must be random to be effective. If you see that nobody delayed your payment, it may have just happened to get a very low delay on each hop.
- Forwarding nodes don’t know how long a HTLC’s payment route is, so they can’t easily pick a good delay time that they’re certain will help with privacy (unless they over-estimate, adding an additional hop’s latency) [8].
Is there something better that we can do?
On Path Adversary
In this attack, the attacker depends on the time between update_add_htlc
and update_fulfill_htlc
to make inferences about the number of hops between itself and the recipient to deanonymize the recipient. It doesn’t matter where the delay happens, just that there is enough delay for it to be ambiguous to the attacker how many hops there are to the recipient. It seems reasonable that we could implement delays on the recipient, instead of with the forwarding nodes. This puts the decision in the hands of the party whose privacy is actually impacted. It also works reasonably well with other hold-time aware systems like jamming mitigations and latency-aware routing, because we have to accommodate the MPP case where the recipient can hold HTLCs anyway.
For sender de-anonymization, the attacker needs to fail a payment and be on-path for the retry. This is more trivially addressable by adding a cool down between attempts and using more diverse retry paths. This is within the control of the sender, so it is nicely incentive aligned.
Off Path Adversary
While timing information is used in this attack, my impression from [6] was that predictable routing algorithms are what makes reducing the anonymity set feasible for the attacking node. This is again a level that we could provide the sender to toggle as they see fit rather than relying on forwarding nodes. Without the ability to prune the network, the anonymity set for this attack remains infeasibly large.
This attack also gets significantly easier for larger payments, as the attacker can prune more channels (that wouldn’t be able to facilitate the amount). So more aggressive payment splitting is another option for privacy conscious sending nodes that does not rely on forwarding nodes for protection.
What to do for attributable failures?
Practically in today’s network, we don’t have any privacy preserving forwarding delays deployed:
- LND (80-90% of public network): has a 50ms commitment ticker to batch updates, but it is not randomized so can trivially be accounted for in the attacks listed above [9].
- Eclair (major router): does not implement forward delays.
So we do not currently have any defenses against the above listed attacks implemented. And we should fix that!
My opinion is:
If we truly believe that forwarding delays are the best mitigation:
- We should all implement and deploy them.
- We should change encoding in attributable failures hold times to enforce minimum value.
If that’s not the case (which I don’t necessarily think it is):
- We should investigate and implement some of the suggestions listed above.
- It’s fine to leave the attributable failures hold times encoded with millisecond granularity.
Footnotes
[0] Lightning Specification Meeting 2025/05/19 · Issue #1258 · lightning/bolts · GitHub
[1] Attributable failures (feature 36/37) by joostjager · Pull Request #1044 · lightning/bolts · GitHub
[2] bolts/04-onion-routing.md at 011bf84d74d130c2972becca97c87f297b9d4a92 · lightning/bolts · GitHub
[3] bolts/02-peer-protocol.md at 011bf84d74d130c2972becca97c87f297b9d4a92 · lightning/bolts · GitHub
[4] Forwarding nodes could flip a high bit to indicate that they’re using ms, but this would require sender cooperation and lead to devastating penalization if senders aren’t modified (because it would lead to their hold time being interpreted as massive).
[5] https://arxiv.org/pdf/2006.12143
[7] https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AFT.2024.12
[8] Yes, we may very well have an incredibly privacy conscious and altruistic routing layer. Even if that’s the case (quite probably, since there isn’t much money to be made with it), we shouldn’t be relying on it to make privacy promises.
[9] Heavily emphasized across all papers is that this delay needs to be random to be impactful.