Latency and Privacy in Lightning

joostjager · June 3, 2025, 9:38am

Following up from yesterday’s spec meet. This is my stance:

We can create friction by adding granularity or enforcing a minimum hold time, but this feels more like avoiding the problem than solving it. If users want fast routes, they will find them regardless. Even without explicit hold time data, latency can be inferred from sender observations over multiple payments. A scoring system that distributes observed delays across route hops is likely sufficient to identify slow nodes over time.

Rather than relying on filtering or encoding tricks that can be bypassed or become outdated, we should embrace the reality that the network will be exposed to performance pressures. This transparency can motivate stronger and more effective privacy solutions aligned with actual user incentives.

In my view, it is more sustainable to design the network to be resilient and private even when nodes compete on latency, rather than relying on weak measures that only obscure the problem and leave critical questions unanswered.

t-bast · June 3, 2025, 10:05am

Can you please summarize the options proposed? I don’t think there’s any written form of it anywhere, which makes it hard to form a good opinion for readers. If I understood correctly, the options were:

Do nothing and keep a millisecond granularity on hold times reported in attributable failures
Change the encoding of the hold time in attributable failures to have a granularity of Xms (with X to be defined): advantages/drawbacks of this option?
Keep the millisecond granularity but ask nodes to subtract a hard-coded threshold value: we’ve discussed several variations of it, and to be honest I’m not sure exactly how that would work and would like to see it written down for analysis

joostjager · June 3, 2025, 10:39am

I’ll leave it to those proposing the alternative options to write up the details, pros, and cons. Personally, I’m in favor of simply keeping millisecond resolution. I don’t think we should postpone real problems by trying to obscure them. If users care about latency, they’ll find ways to measure it regardless. I also don’t think it’s wise to constrain the system today in order to avoid scenarios that are still hypothetical.

roasbeef · June 5, 2025, 1:21am

Here’s my attempt at summarizing the options.

Skip the first two sections here for the options.

Background

Attributable errors provides a way for a sender to receive a tamper-evident error. This means that the sender can pinpoint which node in the route inadvertently or purposefully garbled the error. Today any node can flip a bit in the onion errors, which renders the entire error undecryptable, in a manner where no blame can be ascribed.

The path finding of most implementations today has some component that will attempt to penalize a given node, or set of nodes for an unsuccessful route. The opposite is also useful as the path finder is able to reward nodes for enabling the forwarding attempt to succeed up until a certain point.

Without a way to attribute onion error garbling to a particular node, path finders either need to penalize the entire route, or do nothing. Both aren’t great options.

As a way to incentive the uptake of attributable errors by implementations, Joost proposed that the “hold time” be encoded in the errors for failed payments. In theory, this would allow path finders to pinpoint which nodes are persistently slow (bad network, faulty hardware, slow disk, etc) and penalize them in their path finding implementation. This rests on the assumption that users want fast payments, as poor payments are very bad UX (depending on the app, can appear to be stuck if no visual feedback is given).

FWIW, I don’t think any path finding implementation has yet be updated to take this hold time information into account. Even today, path finders can bias towards geographically colocated nodes to reduce e2e latency (eg: no need to bounce to Tokyo, then LA, if the payment is going to Mexico).

Problem Statement

If we want to encode these hold times in the onion error, then a question that naturally arises is: what encoding granularity should be used? By this I mean, do we encode the latency numbers out right, or some less precise value, that may still be useful.

A related question is if we encode this hold time: to what degree does this decay privacy? This question is what motivated this post to begin with.

Before proceeding to the encoding options, I think it’s important to emphasize that: the sender always knows how much time the attempt took. They can also further bisect the reported values, either by iteratively probing nodes in the path, or connecting out to them to measure ping latency. This brings forth a related third question: what do we gain by encoding less precise values?

One other aspect as mentioned above is that a forwarding node can themselves become a sender. Even ignoring the latency encoding, log the resolution times of HTLCs they forward. For each of those HTLCs (similar amount, CLTV budget, etc), they can launch probes to attempt to correlate the destination. As mentioned above, variable receiver settlement delays mitigates this somewhat.

Latency Encoding Options

I missed some of the initial discussion in the last spec meeting, but IIUC we have the following encoding options:

Precise Values:
- Rationale: The sender already knows how long the route takes, and can measure how long it takes each node to forward as mentioned above.
- Encoding: Encode the actual value in milliseconds.
Bucketed Values:
- Rationale: We don’t want to make it trivial to keep track of what the true per-hop latency is, so we should reduce the precision.
- Encoding: Given a bucket size (another parameter), report the bucket that a value falls in. So if we have buckets of 100 ms, and the actual latency is 120 ms, then 100 ms is reported.
Threshold Values:
- Rationale: Payments already take 1.5 RTTs per hop to extend, then half a round trip per hop (assuming pipelining) to settle. Therefore we can just extrapolate based no common geographical latencies, and pick a min/threshold value. IMO, this only viable if were space constrained, and want to encode the latency value in a single byte.
- Encoding: The value encoded isn’t the actual latency, but the latency subtracted (floor of zero) or divided by some threshold. In the examples below, we assume this threshold is 200 ms, and the actual payment latency was 225 ms.
  - Subtracting: A value of 25 is encoded. If the value is below the threshold, then zero is reported.
  - Dividing: A value of 1 is encoded. Again if the value is below the threshold, zero is reported.

As we know LN is geographically distributed, so the actual latency depends on exactly where all the nodes are located. Sites like this can be used to get an idea of what types of latencies one would see in the real world.

Both the threshold and bucket options need some parameter selected for an initial deployment. How do should we come up with such a parameter? During the discussion it was suggested that we just use a relatively high value like 300 ms, as it takes 1.5 RTT even in the direct hop case. Ofc payments can definitely be faster than 300 ms (small amount of hops, well connected merchant, etc), but anything around ~200-500 ms feels instant.

Flexibility Concerns

One concern brought up during the discussion was flexibility: if we aren’t encoding the actual value, then we need to pick some parameter for either the bucket, or the threshold value. The param is yet another value to bikeshed over.

Changing this value in the future, may mean another long update cycle, as the senders need to upgrade to know how to parse/interpret the new value and it isn’t really useful until all the forwarding nodes also start to set these new values.

Flexibility Middle Ground

One way to partially address this concern would be to: prefix the latency encoding with the type and parameter. So the final value on the wire would be encoding_type || encode_param || encoding_value. This would:

Let nodes choose if they wanted to give granular information or not (hey! I’m fast, pick me!).
- Does one node choosing the precise excessively leak information? I’m not sure, as the sender knows what the real latency is.
Avoid hard coding the bucket/threshold param. As a result, we/nodes/implementations have a path to change it in the future.

With this an open question is: if all, or just one of the encoding modes is specified (with the expectation that senders can interpret them all).

Personally, I favor: the self identifying encoding, with either just the actual value, or buckets (100ms?).

t-bast · June 5, 2025, 7:42am

Thanks for the write-up @roasbeef

What do you mean by “self identifying encoding”? The encoding you’re describing in your “Flexibility Middle Ground” section?

My personal opinion is that privacy is very important and needs to be protected. We chose from the beginning of the project to use onion routing because this was important to us (“us” meaning developers of the network). I strongly favor privacy over speed (within reasonable latency bounds of course): the goal of the lightning network in my opinion is to provide properties that you won’t find in traditional payment rails, and privacy and censorship resistance are properties we absolutely don’t have in traditional payment rails. Sacrificing those to get performance gains for which we don’t have any concrete use-case today doesn’t make any sense to me.

As we’ve highlighted several times during spec meetings, privacy loves company: you can only achieve good enough privacy if that is the default behavior of the network. Most users don’t care about privacy (for them, or for other users of the network): but more importantly, most users don’t know that they care about privacy until something happens to them, because of decades of telling people that if they haven’t done anything wrong, they shouldn’t have anything to hide. But I don’t think this is fine, and I’m afraid of what a society without any privacy looks like - and don’t want to live in one. So I’d rather have privacy built-in by default, and address the need for high performance separately, for example using direct channels (but again, since we don’t have any use-case today that requires very low latency, we can’t really design the right knobs for it).

If lightning succeeds at providing an open access to electronic payments for individuals worldwide, that offers good privacy, censorship resistance, low fees and reasonable speed (even 1-2 seconds is IMO reasonable speed compared to credit card payments), I would count it as an amazing success. I personally think that’s where it provides the most real-world utility and I don’t think we need to do “more” than that, I’d rather be 100% focused on making this use-case work well. I’m not at all interested in just building a faster VISA network.

Obviously, this is a “philosophical” question of what the developers of the network want: it always has been in every choice we make (feature we work on, implementation details, configuration knobs, etc). We’re not impartial and I don’t think we should try to be? We must be open to discussion and listen to our users, but in the end we all make a personal choice of what we want to spend time working on. And it’s great that different contributors value different properties!

I’m sorry if it looks like we’re slowing down progress on the attributable failures PR because of those discussions, but I think they’re important (more important than just shipping a feature without which the network has been doing fine for years). I understand the frustration (I have pending spec PRs that have been waiting for much longer than attributable failures), but I think this is part of the trade-offs of working on an open, decentralized network?

GeorgeTsagk · June 5, 2025, 11:08am

Would like to add some comments to a few of the points made in previous messages:

@joostjager: In my view, it is more sustainable to design the network to be resilient and private even when nodes compete on latency, rather than relying on weak measures that only obscure the problem and leave critical questions unanswered.

It’s also important to note that there are more crucial things to address that directly improve privacy over lightning, like forwarding and sender/receiver delays to mitigate the on/off path adversaries as described previously in this thread. Without addressing these, applying a privacy-dressing on the sender’s feedback is only a partial illusion of privacy.

On top of that the sender may be considered the only actor that deserves to know as much as possible w.r.t what’s going on with their payment, they’re the ones piloting it at the end of the day.

@roasbeef: One way to partially address this concern would be to: prefix the latency encoding with the type and parameter. So the final value on the wire would be encoding_type || encode_param || encoding_value.

I don’t really see the value of having the prefix. If you want to round up or down to buckets of 100ms then you’re free to do so in the simple uint64 encoding. Why would the sender want to know whether you’re doing it or not?

@t-bast: Sacrificing those to get performance gains for which we don’t have any concrete use-case today doesn’t make any sense to me.

A 2-part reply:

a) We’re not really sacrificing privacy, it’s more like being honest about the current situation. Similarly, we are not gaining any privacy by obfuscating the latency that is reported to the sender. Adversaries face no extra difficulties in analyzing traffic.

b) Focusing on the performance of LN is a fundamental use-case. Every day someone pays for their food or ticket with LN and the payment resolves fast is a small win for the whole network. I’m not saying we need 100ms instead of 1s, but we definitely need to treat performance as a fundamental requirement of the network.

@t-bast: As we’ve highlighted several times during spec meetings, privacy loves company: you can only achieve good enough privacy if that is the default behavior of the network.

I still don’t understand whether we’re assuming that most people are willing to strip away parts of the software for the sake of speed. If that’s the assumption, then the real mitigations for privacy (forwarding & sender/receiver delays) are really questionable, as anyone can just set the delays to 0s for a faster experience.

As you mentioned above “you can only achieve good enough privacy if that is the default behavior of the network”, I believe there’s a reason we don’t want to say “the enforced behavior of the network”, as in we still want to allow people to tweak their node if they really want to.

If all implementations have forwarding batching, sender & receiver delays and many other enhancements for privacy turned on (maybe some not even configurable), then would the majority of the network choose to nullify them? All we can really do is guide the user into making the right choices by controlling what’s configurable, what the default values are, and by making sure to highlight the privacy impact of certain configuration choices in documentation.

Final personal note

Attributable failures are very good at defining the hop that gets the blame in a route, not retrieving the latencies. The reported latencies are by nature “handwavy”, it’s what the node claims to have held the HTLC for. Any (sane) pathfinding algorithm would normalize that value first before incorporating it.

One of the arguments against expressive latency is that we’re guiding nodes into picking the super fast chosen few routing nodes for payments and channel opens, weakening the topology of the lightning network. That’s already happening today by having websites score nodes for you and then guiding you into choosing the best ones, which is already a worse version of the problem being played out right now.

Instead I’d like to see a future where we don’t have to rely on external websites to source metrics and data for the network from and what your own node provides locally would be more than sufficient. The very existence of these websites is an indicator that people desire performance, and it’s better to satisfy performance locally and on a p2p level (alongside privacy) rather than by submitting or fetching stats.

Proposed way forward

Keep the latency encoding as a uint64, where the value is the milliseconds the HTLC was held for. Each implementation is free to do bucketing or thresholds on the values they report when forwarding, without the need to report it. Similarly, the sender is free to apply thresholds or bucketing on the reported values, without signalling if they’re doing it or not. Whether the former are going to be configurable is also up to the implementations (personal opinion is to not have it configurable, i.e someone has to build their own binary to change it).

If we ever have a more solid foundation on changing the encoding to be something specific then we can always upgrade to a more strict encoding (use a new TLV field and eventually deprecate the old one).

t-bast · June 5, 2025, 12:24pm

We are giving away more information than before, so this is definitely adding privacy risks. I don’t buy the argument that adversaries have no extra difficulty obtaining that information elsewhere. It will keep getting harder and harder for adversaries to collect such information when we have random delays, message padding and jamming protection (which will include upfront fees, making probing costly).

The issue is also mostly that incentivizing nodes to favor latency over privacy is not the direction I’d like to see the network take, as this is something we cannot come back from.

I never said that we shouldn’t treat performance as important, I said that it doesn’t make sense to me to chase performance gains outside of the bounds of what really matters for payment UX if it hurts privacy. Having 1ms precision on payment forwarding latency is absolutely not crucial to get a decent payment UX.

Of course some people will modify their software. But I’m ready to bet that it is going to be a very small minority of nodes, so it’s fine. I’m convinced that the vast majority of the network will run standard lightning implementations, with most of the default configuration values.

Nobody is using those to make their payments outside of a small niche of developers / tinkerers who are mostly playing around with running a node because they want to do something with their bitcoin: they’re not the users who make the majority of the payment volume.

I think there’s a big gap between real users (mostly non technical, who just need a payment app that works) and technical people who are spending a lot of their free time tinkering with lightning mostly for experimentation or to be part of a community because they own some bitcoin. It may be harsh, but I don’t think the latter is what we should be focusing on when we design features.

On top of that, as Matt pointed out, the data exposed by those websites is mostly garbage. I don’t believe that it has any significant impact on the network, or ever will.

t-bast · June 5, 2025, 12:56pm

To be clear: I’m not saying I’m absolutely against the 1ms precision in attributable failures. I’m only highlighting that I want to have a better understanding of the privacy risks before making a decision, and how important random delays are for privacy at different stages of the payment path, to ensure that attributable data doesn’t incentivize in the wrong direction.

I do think that a 1ms precision isn’t at all important for payment UX, and 100ms precision per-hop would be enough.

GeorgeTsagk · June 5, 2025, 1:19pm

Only the sender gets these latency values. An on/off path adversary does not extract additional data.

As long as all of the above are not implemented it doesn’t matter if the sender sees low or high resolution values. And when the above do get implemented, the number will have a natural threshold anyway.

If we don’t build for node runners too then won’t LN end up being a handful of centralized routing nodes owned by companies, who will probably submit traffic data directly anyway?

Agree, but instead of binding the protocol today with fixed numbers that we come up with we can just let it be uint64 and implementations will enforce the X ms resolution or threshold over that field. This is only bad in the case where we assume everyone in the network to wake up one day and strip away the rounding/thresholds. It also allows us to change the way we report hold times in the future without having to change the way it appears on the wire.

MattCorallo · June 5, 2025, 1:52pm

With randomized delays (plus randomization in I/O latency), doing iterative probing may require a very nontrivial number of tries to map the whole network. With future upfront fees, this may be impractical. In practice, we see people who try this largely fail to provide reasonably accurate data today (at least in the sense that we care about here - they can certainly provide “this node is terrible/on tor” vs “this node seems reasonable”-type data, which is obviously a thing we want to provide to all nodes here).

I don’t buy that its worth us all implementing a configuration knob and multiple encoding options for this. Sometimes more flexibility isn’t the right answer

This would defeat the whole purpose of attempting to reduce the information provided - privacy loves company - the whole point of the discussion here was to ensure that nodes cannot (in a standardized way) communicate granular latency information as it incentivizes them to do so as senders would (presumably) use that information to (strongly) prefer nodes with even marginal decreases in latency.

Indeed, but providing fine-grained latency information strongly incentivizes nodes to remove any forwarding/batching delays, effectively limiting our options later to improve privacy (at least by ensuring everyone has a batching delay, at least by default). I don’t think this conversation was ever about whether this, itself, provides privacy, but rather whether it closes off privacy features we want.

Its important that we be clear about what this means - lightning has some fundamental limits, and will (in its current protocol) certainly never achieve reliable payment latencies below a second or so (and is already achieving such latencies in practice!). Indeed, changes in payment latency by an order of magnitude absolutely changes what lightning can be used for and opens up new use-cases which we should want. However, in Carla’s analysis in the OP it seems like forwarding/batching delays round an RTT or less can result in reasonable privacy improvements (assuming enough traffic on nodes and some other network-level improvements we want to make).

Thus, declining to offer fine-grained latency information and encouraging nodes to always batch forwards/failures on the order of 100ms will not change the set of use-cases LN is usable for, nor change the user experience of a lightning payment, nor materially change the “performance of LN”. Given this, and the fact that privacy is also a critical feature of LN (whether we have it today or not), I’m not entirely clear on why its worth providing fine-grained latency information here. We don’t currently envision a serious use-case for that kind of data, and there’s some (even if marginal) risk in making it available.

t-bast · June 5, 2025, 2:55pm

But it’s not a matter of whether only the sender gets these latency values, the issue is that it incentivizes routing nodes to be as fast as possible to be picked up by senders and thus removes our ability to introduce random delays in the future, because the incentives will be against them.

It’s true that we haven’t implemented yet those privacy features (random delays / message padding), but we’ve made sure that nothing prevented us from adding them whenever we had time to work on them. Creating an incentive against them would be a big change that I’m afraid we wouldn’t be able to fix afterwards.

I believe you misunderstood my comment here. We are of course building software for routing nodes, but it doesn’t make sense to build features they only use for fun that don’t have a useful impact on the network. There are a lot of things that routing nodes think they want (most of the time the small ones that are mostly tinkering) that are either useless or harmful, and I don’t think we should provide them, even though some people are asking for it. They’re free to implement them anyway and document them in bLIPs, but I’m pretty sure they’ll realize it’s not worth the effort. But it’s a good thing that they can do it because it’s open-source software, and then they can prove me wrong!

I completely agree with that.

carla · June 5, 2025, 6:59pm

It seemed to be broadly agreed on in the spec meeting that in a world with receiver-side delays, we no longer need to rely forwarding delays for privacy against an on-path attacker, yay. We should implement and deploy #1263, and we’ll get a nice improvement against a known attack.

Once we’re in a receiver-delay world, the main privacy purpose for forwarding delays is to make it more difficult for an off-path adversary to trace payments through the network. While [6] outlines a case that relies on running repeated pathfinding attempts to reduce the anonymity set and identify the sender/receiver, there could be other possible ways to try deanonymize senders/receivers in these traced paths. IMO we don’t really have a good understanding of what these are; seems like a good direction for future research (and not something that we’re going to know in the near term). I agree with the intuition that an adversary that’s able to make these partial paths can start to mess with privacy is correct.

An actually effective forwarding delay that aims to have more than one HTLC per batch seems like a very difficult number to pick. It’s highly dependent on a node’s traffic, which is likely quite variable during the day, and will change as channels open and close. Interested to hear whether anyone’s aware of research that we could look at to inform picking such a value?

It seems to me that message padding and some degree of cover traffic (dandelion for LN?) offer better privacy protections than forwarding delays because:

They’d offer protection to nodes on the edges of the graph that have less traffic (unlikely to be able to batch otherwise).
A bandwidth/privacy tradeoff is probably more palatable to forwarding nodes than a latency/privacy one (adding 30 GB egress to AWS is like $3), so they’re more likely to adhere to this measure.
An attacker can probe latency without attributable failures (albeit at a cost, in an unconditional fee world)

All that to say, it’s not obvious to me that we’re closing ourselves off to future privacy improvements because (IMO) forwarding delays seem to be the least promising of the mitigations available to us.

Would folks consider leaving ms encoding and adding a sender-side advisory in the specification not to penalize hops under some threshold (300ms)? That provides a default that doesn’t put downward pressure on forwarding delays that’s more flexible in the future - if we find out that there’s no privacy problem, we can remove the sender instruction, and if we find out there is one we can ship a new default pretty easily.

I think it’s reasonable to believe that the majority of the network will run with this default (likely, without even knowing it’s there).

brh28 · June 5, 2025, 7:39pm

Could someone elaborate on why the self-reported hold times are necessary? Specifically, I’m wondering:

What are the reasons for routing nodes holding onto HTLCs? Is this happening today?
If a routing node is delaying an HTLC, why would they self-report it?

[Edit] To my understanding, if all nodes in a path are honest, this informs the payment sender of the latency at each hop in a route. If a node lies, the difference in reported times by the node and and its peer will localize the delay to those two nodes; in which case, the sender likely avoids both peers.

joostjager · June 6, 2025, 7:55am

Routing nodes may not deliberately delay HTLCs, but slow hardware, poor network conditions, or privacy-related strategies can cause latency. These nodes aren’t required to disclose intentional delays. However, the sender can measure the total end-to-end latency and may assign penalties to nodes involved. Without hold time information, the sender is likely to distribute this penalty evenly across the route.

Nodes that report their hold times can influence how the penalty is allocated. Including the delay in their report shifts the blame toward the next (outgoing) hop, while omitting it shifts blame toward the previous (incoming) hop. This trade-off allows each node to strategically decide what to report based on their own interests.

joostjager · June 6, 2025, 8:11am

I’d like to offer another angle on this issue that ties back to the incentive mismatch raised earlier.

Routing nodes are naturally incentivized to minimize latency. Holding an HTLC slot longer than necessary can reduce their capacity to forward other payments and may lead to lost fee revenue. Nodes that don’t see heavy traffic may also see little reason to batch, unless it’s adaptive and only used when demand is high. On top of that, longer-held HTLCs increase commitment transaction weight and could raise the cost of force closures. This has all been mentioned already.

From their perspective, there’s no benefit in adding delay to HTLCs. So why would they, given these downsides?

Even if we try to encourage delay-based privacy features, the economically dominant nodes may strip them out to keep latency as low as possible. This is rational behavior under current incentives.

Now suppose the network converges on coarse-grained hold times, say all nodes advertise 300 ms.

In that environment, a node that wants to introduce privacy-preserving delays at its own cost has no way to stand out. The coarseness of the hold time field makes it impossible to signal variability or randomness to the sender. Every node appears the same, even if some are doing extra work.

In that sense, coarse-grained hold times may actually discourage privacy improvements. They flatten the signal space and remove the ability for privacy-oriented nodes to differentiate themselves from latency-optimized ones.

t-bast · June 6, 2025, 9:03am

I don’t think that this is correct, or as clear-cut as you think: can you detail what makes you think that? The main performance bottleneck for routing nodes is by far disk IO (database operations) that happens when commitment_signed is sent. Nodes are thus incentivized to batch HTLCs to minimize this disk IO, especially when they route a lot of payments. Even if they don’t route a lot of payments, they cannot know when the next HTLC will come: so it is a sound strategy to wait for a small random duration before signing, in case another HTLC comes in.

Interestingly, since such batching reduces the frequency of disk IO, it provides more stable latency. The end result is a higher median latency (ie not chasing every millisecond) but a smaller standard deviation.

The batching interval really depends on the expected frequency of payments relative to the performance of the DB. But I believe that if lightning is successful at being a payment network, it doesn’t have to be a huge value? I think that we can use a value that provides a good enough payment UX while providing good node performance.

I agree with you. Based on my current understanding, my preferred choice would be:

receiver-side random delays
sender-side random delay on retries
small randomized batching interval at intermediate nodes (mostly for performance, but also to add a small amount of noise of relay latency)
random message padding / cover traffic (which I think doesn’t have to be CBR to be effective)

As you say, this doesn’t rule out the current 1ms encoding for attributable failures. But I’d be curious to have @MattCorallo and @tnull’s thoughts here: they mentioned during the spec meeting that intermediate forwarding delays are important for privacy even when we have random message padding and cover traffic, and I don’t understand why. So I may be missing something important.

joostjager · June 6, 2025, 9:19am

I think I’ve mentioned reasons for routing nodes to minimize latency. In line with that, if disk IO increases latency, they’d indeed want to batch. But that doesn’t necessarily mean that they’d use batching always blindly. For less busy nodes, there may be no latency improvement. And indeed, HTLC busy times may be unpredictable, but probably not completely random. Overall it can still be better for latency to use an adaptive batching strategy. Even if a batch is sometimes missed, on average latency can still be better than always batching.

tnull · June 6, 2025, 9:35am

Yes, totally agree in generally here.

Although I don’t think we need to add sender-side retry delays if we wouldn’t retry over exactly the same route. And, AFAIK, only LND currently does this under certain circumstances where they give a node a ‘second chance’ if they report back one of a certain set of failure codes.

So, while the attack described in the Revelio paper is mostly based on a heuristic that exploits the distinct packet sizes, they still group the streams of IP packages / 3-tuple (sender IP, receiver IP, message length, basically) that an adversary might observe at different points by their timing. Basically, the adversary would be able to observe the HTLC dance at one point in time / in the network, and then an HTLC dance at a later point / a different point in the network. To correlate these two observations and reconstruct that it was in fact the same payment they use timing information. The same approach could be utilized by an on-path adversary with multiple vantage points in the network post-PTLCs, but of course currently they can simply match the payment hash to ensure that two observations are indeed the same payment.

So yes, the more entropy/noise we add/maintain to/in the forwarding process the harder we make the adversary’s job of coming up with reliable models, which is why we likely wouldn’t want to drop the forwarding delay entirely (although note it’s mostly about maximizing uncertainty not the added net delay necessarily).

carla · June 6, 2025, 12:47pm

I’m in agreement with all of the above.

Just to re-iterate @t-bast’s question. Wouldn’t the whole point of cover traffic be that we have “fake” HTLCs propagating through the network at the same time as real ones so that an attacker gets false positives on this type of surveillance?

tnull · June 9, 2025, 8:21am

Yes, but I don’t think it necessarily needs to be fully ‘fake HTLCs’, as any interleaved messages might somewhat break up the distinct HTLC-forwarding pattern (assuming they all look the same, post message padding).