I made an optimization to the fuzz test and was able to find many surge attacks. For example, here’s one where the victim loses 50% of their revenue:
$ go test -run=FuzzSurgeAttack/e8e56d18133e
27e4
--- FAIL: FuzzSurgeAttack (0.00s)
--- FAIL: FuzzSurgeAttack/e8e56d18133e27e4 (0.00s)
fuzz_test.go:171: Successful attack: Peer count: 18, cutoff: 17:
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 808464432 reputation (6m) contributes 67372036 revenue (2w)
- 48053104688 reputation (6m) contributes 4004425390 revenue (2w)
- 52348071984 reputation (6m) contributes 4362339332 revenue (2w)
- 52348071984 reputation (6m) contributes 4362339332 revenue (2w)
- 69527941168 reputation (6m) contributes 5793995097 revenue (2w)
- 69527941168 reputation (6m) contributes 5793995097 revenue (2w)
- 69527941168 reputation (6m) contributes 5793995097 revenue (2w)
- 69527941168 reputation (6m) contributes 5793995097 revenue (2w)
- 86707810352 reputation (6m) contributes 7225650862 revenue (2w)
- 86707810352 reputation (6m) contributes 7225650862 revenue (2w)
- 86707810352 reputation (6m) contributes 7225650862 revenue (2w)
with outcome: Node lost: 50 % of revenue - attacker paid: 28586797036 to meet threshold: 58121013316, node still earned: 28586797036 (0 honest + 28586797036 attacker), <nil>
I think this highlights a case where the current reputation algorithm performs poorly in general: fan-in topologies (and also fan-out if we’re using bidirectional reputation).
With the recommended parameters, incoming reputation is calculated over a 24-week period while outgoing revenue is calculated over a 2-week period. So, each incoming node will generally have a reputation score 12x higher than their contribution to the outgoing revenue. It follows that if there’s more than 12 incoming nodes contributing equally to outgoing revenue, none of them can ever build enough reputation to access endorsed slots.
A surge attack is essentially just putting a finger on the scale to tip it towards this crossover point where no incoming nodes can build enough reputation. The closer the topology and traffic flows are to the crossover point already, the less the cost of a surge attack.
Mitigation Thoughts
Custodial Lightning
For custodial wallets, fan-in and fan-out topologies are likely rare and the current reputation algorithm might be “good enough”.
LSP-Specific Reputation Algorithms
LSPs generally have very high fan-in/out – many small user channels fan-in to the LSP, while just a few large channels route out from the LSP. It seems reasonable that LSPs would use their own reputation algorithms to handle their specific topology (AFAIU eclair/phoenix are already working on their own algorithm). But how exactly those algorithms should work is an open question.
Trampoline routing may also help. LSPs may be able to endorse trampoline payments regardless of the previous node’s reputation, since they get to choose the remainder of the path. But if the destination of such payments is also an LSP user, the opposite problem (fan-out) will exist at the destination, and endorsed payments are likely to be failed back under the new bidirectional algorithm.