Here’s my attempt at summarizing the options.
Skip the first two sections here for the options.
Background
Attributable errors provides a way for a sender to receive a tamper-evident error. This means that the sender can pinpoint which node in the route inadvertently or purposefully garbled the error. Today any node can flip a bit in the onion errors, which renders the entire error undecryptable, in a manner where no blame can be ascribed.
The path finding of most implementations today has some component that will attempt to penalize a given node, or set of nodes for an unsuccessful route. The opposite is also useful as the path finder is able to reward nodes for enabling the forwarding attempt to succeed up until a certain point.
Without a way to attribute onion error garbling to a particular node, path finders either need to penalize the entire route, or do nothing. Both aren’t great options.
As a way to incentive the uptake of attributable errors by implementations, Joost proposed that the “hold time” be encoded in the errors for failed payments. In theory, this would allow path finders to pinpoint which nodes are persistently slow (bad network, faulty hardware, slow disk, etc) and penalize them in their path finding implementation. This rests on the assumption that users want fast payments, as poor payments are very bad UX (depending on the app, can appear to be stuck if no visual feedback is given).
FWIW, I don’t think any path finding implementation has yet be updated to take this hold time information into account. Even today, path finders can bias towards geographically colocated nodes to reduce e2e latency (eg: no need to bounce to Tokyo, then LA, if the payment is going to Mexico).
Problem Statement
If we want to encode these hold times in the onion error, then a question that naturally arises is: what encoding granularity should be used? By this I mean, do we encode the latency numbers out right, or some less precise value, that may still be useful.
A related question is if we encode this hold time: to what degree does this decay privacy? This question is what motivated this post to begin with.
Before proceeding to the encoding options, I think it’s important to emphasize that: the sender always knows how much time the attempt took. They can also further bisect the reported values, either by iteratively probing nodes in the path, or connecting out to them to measure ping latency. This brings forth a related third question: what do we gain by encoding less precise values?
One other aspect as mentioned above is that a forwarding node can themselves become a sender. Even ignoring the latency encoding, log the resolution times of HTLCs they forward. For each of those HTLCs (similar amount, CLTV budget, etc), they can launch probes to attempt to correlate the destination. As mentioned above, variable receiver settlement delays mitigates this somewhat.
Latency Encoding Options
I missed some of the initial discussion in the last spec meeting, but IIUC we have the following encoding options:
- Precise Values:
- Rationale: The sender already knows how long the route takes, and can measure how long it takes each node to forward as mentioned above.
- Encoding: Encode the actual value in milliseconds.
- Bucketed Values:
- Rationale: We don’t want to make it trivial to keep track of what the true per-hop latency is, so we should reduce the precision.
- Encoding: Given a bucket size (another parameter), report the bucket that a value falls in. So if we have buckets of 100 ms, and the actual latency is 120 ms, then 100 ms is reported.
- Threshold Values:
- Rationale: Payments already take 1.5 RTTs per hop to extend, then half a round trip per hop (assuming pipelining) to settle. Therefore we can just extrapolate based no common geographical latencies, and pick a min/threshold value. IMO, this only viable if were space constrained, and want to encode the latency value in a single byte.
- Encoding: The value encoded isn’t the actual latency, but the latency subtracted (floor of zero) or divided by some threshold. In the examples below, we assume this threshold is 200 ms, and the actual payment latency was 225 ms.
- Subtracting: A value of 25 is encoded. If the value is below the threshold, then zero is reported.
- Dividing: A value of 1 is encoded. Again if the value is below the threshold, zero is reported.
As we know LN is geographically distributed, so the actual latency depends on exactly where all the nodes are located. Sites like this can be used to get an idea of what types of latencies one would see in the real world.
Both the threshold and bucket options need some parameter selected for an initial deployment. How do should we come up with such a parameter? During the discussion it was suggested that we just use a relatively high value like 300 ms, as it takes 1.5 RTT even in the direct hop case. Ofc payments can definitely be faster than 300 ms (small amount of hops, well connected merchant, etc), but anything around ~200-500 ms feels instant.
Flexibility Concerns
One concern brought up during the discussion was flexibility: if we aren’t encoding the actual value, then we need to pick some parameter for either the bucket, or the threshold value. The param is yet another value to bikeshed over.
Changing this value in the future, may mean another long update cycle, as the senders need to upgrade to know how to parse/interpret the new value and it isn’t really useful until all the forwarding nodes also start to set these new values.
Flexibility Middle Ground
One way to partially address this concern would be to: prefix the latency encoding with the type and parameter. So the final value on the wire would be encoding_type || encode_param || encoding_value
. This would:
- Let nodes choose if they wanted to give granular information or not (hey! I’m fast, pick me!).
- Does one node choosing the precise excessively leak information? I’m not sure, as the sender knows what the real latency is.
- Avoid hard coding the bucket/threshold param. As a result, we/nodes/implementations have a path to change it in the future.
With this an open question is: if all, or just one of the encoding modes is specified (with the expectation that senders can interpret them all).
Personally, I favor: the self identifying encoding, with either just the actual value, or buckets (100ms?).