Your latter sentence is why I suggested the taproot output for the minimal approach rather than the txid. This gives you something to scan for using regular block filters. Though txid filters are theoretically possible too, of course.
I’m not convinced there is a safe enough middle ground where we only partially check stuff. Either we trust the sender or we don’t. If we trust the sender then we don’t have to fetch any data from other sources at all. The sender can just provide everything that is needed to spend the output (see the last paragraph of my post).
Perhaps, but I suspect it may be a little early to start considering where to be flexible. In order to come up with a good protocol we’ll inevitably have to first form a reasonably well-informed opinion about what the client design might look like. Or at least that’s the only way to stay reasonably lean.
Opinions can differ, but I’m not very optimistic about Web of Trust reliance, for instance. If we had a well-functioning and reliable WoT to slot this all into then great, but today we do not.
The easy part of this question is that not waiting is clearly preferable to Alice. How I’d approach the rest of the question is by figuring out what protocol we end up with when we answer this with both a yes and a no and seeing how practical the end result is for Bob.