"precog trickery"
Basically the servers will try to predict the player's inputs, and send multiple different results at once for the most likely possibilities. Then the input actually happens, the client chooses the output frame that matches best, and possibly does some light image manipulation if something is slightly different, like the direction or magnitude of an analog stick.
Microsoft experimented with something like that a while ago and apparently got great results. Technical paper on it:
Graphs from the paper:
On the left, general opinion score from people playing. In the middle, player performance based on health remaining at the end of a segment. On the right, time taken for players to complete the segment.
Yeah, it's a thing, but it gets a lot more complicated with higher input complexity. The state space explodes.
You can apply ML, but there's always the possibility of it going wrong.
And, of course, there's no reason a local box can't apply the same tech. Or even just, when powerful enough, return a same-frame response to input without any prediction.