Highly Available Lightning Channels Revisited – ROUTE OR OUT

MattCorallo · February 11, 2025, 6:03pm

I think this is a really bad idea.

The goal of getting payments down to reliably succeeding on the first try is, of course, critical to lightning being a successful payments platform, anything less than perfect reliability is a failure. However, the reality is pathfinding, today, is not a major barrier to that. Obviously doing pathfinding well takes a pretty nontrivial amount of work (see Pathfinding with LDK | Lightning Dev Kit Documentation) plus probing regularly, but given that work has already happened, payments do go through the first time (if you’re willing to pay the required fee). Some of this, of course, is because of the network being fairly strongly connected, but I’m confident our pathfinding logic will scale to somewhat larger networks (again, given you’re doing background probing, and maybe utilizing trampoline if we get a much larger network).

There are, of course, many cases where payments today do fail, but the reasons for that are rarely due to fundamental limitations in the pathfinding step - payments often fail if your node either doesn’t regularly probe or doesn’t fetch scoring information from a node that does (something which all nodes really should do by default!). Payments also often fail because nodes simply do not have the available liquidity to make the payment - often the recipient doesn’t have some JIT channel service (again something that all nodes targeting non-hobbyists really need to be doing by default!) or the sender may not have enough actual capacity (due to dust limits and poor feerate estimators, funds split across multiple channels pre-splicing, reserve values, etc, etc, etc…things that are all slowly being improved, especially with an upcoming channel type based on TRUC).

On the other end, this proposal has real social costs. Many senders are likely to take the “easy way out” - instead of fixing their pathfinding logic they’ll just assume its some fundamental issue and only route through “HA Nodes”. Average routing node operators will be caught between a rock and a hard place, then can:

not signal “HA”, and not get any material routing volume,
signal “HA”, and find their liquidity occasionally depleted (it happens, even on a well-balanced node sometimes you just don’t have the liquidity), causing them to be marked “bad signalers” and lose their routing volume,
signal “HA” and try to do JIT rebalancing, having to charge a corresponding fee increase. But of course rebalancing doesn’t always work - usually if you don’t have enough liquidity in one “direction” someone else might not either, so you’ll still fail some payments (unless you want to charge dramatically more than market rate for relaying), causing you again to lose your routing volume.

But big custodial operators running big nodes won’t have this issue - they can agree to open 0conf channels with each other, use that for “JIT rebalancing” and signal “HA”. They’ll get all the naive sender routing volume and we’ll end up with ~all “end user” lightning payment volume routing through big AML operators…the opposite of the goal of lightning.

In general, before we take actions that have substantial social implications for the network, we should seek to ensure that (a) they’re actually necessary, (b) that we’ve thoroughly explored other options to fix the underlying issues. I don’t think that this is currently necessary, and driving it today confuses issues that are not fundamental to lightning routing with issues that are.