I found the problem. Chain work isn’t the sum of D’s that starts at the block after a timestamp at height H and ends at the timestamp at H+N, but the sum of D’s between those 2 timestamps which has N-1 blocks. Starting and ending on a block is a biased selection of the timespan.
But the sum of Ds in the past N blocks is the work done up until current time (a randomly chosen point in time to perform the query, not an ending timestamp).
Hashrate at any height h in the past is
2^32 * sumD(h+1 to h+N) / timespan(h to h+N+1)
And the work in that timespan is just this times that timespan.
I think this is slightly more accurate than the prior method which was:
2^32 * sumD(h+1 to h+N) / timespan(h to h+N) *(N-1) / N
because the correction is necessary due to using a timespan that’s not exactly correct for the work.
In my competing tips example, pretend the last block has not been found, but the ending time is the same and is local time which is the expected time to be randomly looking at both chains. Then both our hashrate and work calculations agree that the tip with the easier difficulty did 33% more work.
If you want the work done in the solvetime of N blocks, and you sum up the difficulties for those N blocks, then you have to apply the (N-1)/N correction to get the ~correct amount of work in that timespan.
In deciding a leading tip, you just sum the difficulties as usual because you want the work up until current time, not the last timestamp.