If your coinbase tx (ignoring the witness) is N bytes, then appending a 0 sat output of the form OP_RETURN OP_PUSH[4+X] <X bytes padding> <4B nonce> where X=max(0,16-(N+23)%64) allows you to update the 4B nonce with only the final sha256 round, at a cost of between 14 to 30 bytes of additional coinbase data, achieving essentially the same benefit as using the coinbase nlocktime as a nonce (which requires between 0 and 12 bytes of padding for alignment anyway). If you’re already doing extranonce work via an OP_RETURN output, then there’s only 0-4 bytes of additional overhead to align the end of the final OP_RETURN data compared to aligning the nlocktime field in the first place.
32b of extra nonce in addition to 32b nNonce and 16b from BIP 320 and rolling nTime once per second gives 1200 zettahash/second (1.2 trillion TH/s), so about 1000x the total current hashrate.