You are correct, @zawy.
Working this out a bit more formally:
- Let r be the unknown but fixed hashrate (in hashes/second). We assume it does not change during the time of our observation.
- Let W_i be the amount of work in block i, a known constant (equal to its expected number of hashes), so:
- W_i = 2^{256} / (\mathrm{target}_i + 1)
- W_i = 2^{48} / (2^{16}-1) \cdot \mathrm{difficulty}_i
- Let t_i be the durations of each block in seconds, which we assume are exactly observable (not true in practice, but the best we can do).
The hashes per block is a random variable that follows a geometric distribution, but can be approximated well as an exponential one due to the enormous number of trials involved:
h_i \sim \mathrm{Exp}(\lambda=1/W_i)
and the time per block t_i = h_i / r, and thus by the fact that \lambda is an inverse scale parameter,
t_i \sim \mathrm{Exp}(\lambda=r/W_i)
To simplify what follows, introduce \alpha_i = t_i / W_i, which measures how long blocks took relative to their difficulty (its unit is seconds per hash). For these, we have:
\alpha_i \sim \mathrm{Exp}(\lambda=r)
So all \alpha_i are identically distributed. They can also be shown to be independent, even when there are difficulty adjustments in between the measured blocks. Their PDF is
f(\alpha) = r \exp(-r\alpha)
Our goal is estimating r, based on a series of n observations (blocks) \bar{\alpha}. To start, we can build a maximum-likelihood estimator for r, which is the value \hat{r}_\mathrm{MLE} for which the function
\begin{split}
\hat{l}(r;\bar{\alpha}) \, & = \, & \sum_{i=1}^n \log f(\alpha_i) \\
& = \, & \sum_{i=1}^n \log \left( r \exp(-r\alpha_i) \right) \\
& = \, & n \log(r) + \sum_{i=1}^n \log ( \exp(-r\alpha_i)) \\
& = \, & n \log(r) - r \sum_{i=1}^n \alpha_i \\
\end{split}
is maximal. The derivative in r is
\hat{l}'(r;\bar{\alpha}) = \frac{n}{r} - \sum_{i=1}^n \alpha_i
which is 0, and maximizes \hat{l} in
\hat{r}_\mathrm{MLE} = \dfrac{n}{\sum_{i=1}^n \alpha_i}
If the difficulty is constant within the window, then this is equal to the current formula in getnetworkhashps
:
\hat{r}_\mathrm{RPC} = \dfrac{\sum_{i=1}^n W_i}{\sum_{i=1}^n t_i}
So far so good. The formula being used is the maximum-likelihood estimator, at least when the difficulty does not change within the measured interval. And if the difficulty does change within it, then the starting assumption that the true but unknown hashrate is a constant throughout the interval probably doesn’t hold anyway, and it may be reasonable to deviate from it.
However, the real question is whether this estimator is unbiased. To determine that, we compute the expected value of the estimation \hat{r}_\mathrm{MLE} when repeating the experiment many times (each experiment consisting of n block measurements), with a known true hashrate r.
\mathrm{E}[\hat{r}_\mathrm{MLE}] = \mathrm{E}\left[\dfrac{n}{\sum_{i=1}^n \alpha_i}\right]
Let
\beta = \sum_{i=1}^n \alpha_i
which is distributed as \beta \sim \mathrm{\Gamma}(n, r). Then
\begin{split}
\mathrm{E}[\hat{r}_\mathrm{MLE}] \, & = \, & n \cdot \mathrm{E}[\beta^{-1}] \\
& = \, & n \cdot \frac{r}{n-1} \\
& = \, & \frac{n}{n-1} r
\end{split}
which is indeed a factor \frac{n}{n-1} higher than what it should be. An unbiased estimator can be created by correcting for this factor, and we get
\begin{split}
\hat{r} & \, = & \, \frac{n-1}{\sum_{i=1}^n \alpha_i} \\
& \, = & \, \frac{n-1}{\sum_{i=1}^n \frac{t_i}{W_i}}
\end{split}
I believe it can be shown that this unbiased estimator is sufficient and complete, which would imply it is the minimum-variance unbiased estimator(MVUE).