[mosh-devel] Optimizing Mosh's percived latency

Tue Jul 14 13:52:00 EDT 2015

On 7/10/15 8:22 PM, John Hood wrote:
> Yes, Mosh is by default a little slower on fast local links than ssh.

I've always found that Mosh feels slower than ssh on low-latency links,
and this is a nut I'm trying to crack.

After tangling with Mosh's frame-timing internals (which are a
little...tangly) and stepping back and reading Keith's USENIX paper
again, I'm realizing my goals differ a little from Mosh's current goals
and the discussion in the paper.  I'm also thinking the problem is a
little more complex than I'd originally seen it.  I think maybe we can
improve on what Mosh currently does, though.

As I understand the paper (Keith, correct me if I'm wrong), when
considering remote echo, Mosh is attempting to optimize the time to
completion of an update after a given keystroke, and to control the rate
of frame transmission in the case of continuous input from a server
application.  At least, I think that's what is being discussed in
section 4's "Appropriateness of timing parameters" and the associated
Figure 3.  In order to do this, it applies a "collection interval" of
8ms on the server's first frame in response to a keystroke (or more
accurately, the first frame sent after a relatively-long idle time), and
then as long as input from the server application remains busy, using an
interval that is a smoothed RTT estimate clamped to the range 20..500ms.
 (There are numerous complications from ack messages and prospective
estimates of what states the receiver currently knows about, but I think
this is the big picture here.)

I think there are perhaps three things that should be optimized for, in
terms of latency:

* On interactive typing, the user's typed characters should be echoed as
quickly as possible, because the user is typically looking at their
insertion point for feedback on their touch (or not) typing.  It seems
to me that since this is a trained feedback loop, it's more critical to
optimize this RTT.  If the typing causes updates to other areas of the
screen, that is also significant, but less important than the text
immediately surrounding the insertion point.

* On larger screen updates where a substantial amount of the screen is
updated (page down, reformat code/text, etc), then the completion time
is the metric of interest-- this is a more conscious operation, often
requiring some evaluation of the new display before taking the next
typing action.  Also, text at the cursor location is often completely
replaced and the cursor is often moved for these updates.

* On unconstrained output to the console (from 'cat largefile', say),
you do want a latency constraint, but it can be a fairly relaxed one--
if a user is looking to hit ^C in this situation, they are not
interested in the output being presented, or very interested in exact
details of the subsequent shell prompt-- they merely want the pointless
motion of output to stop.  It is also this situation where we want to
constrain the frame rate to avoid excess traffic.  This situation is
also relatively rare, it would seem-- a display of text
scrolling/updating too fast to read is not a generally useful thing.

I haven't thoroughly examined this, but my impression is that most
interactive applications will send echo text as their first response to
user input, and only after that update text elsewhere on the screen: to
the right of the cursor, row/column/status indicators, highlighting.  So
I think interactive typing is best optimized by sending the first update
with no or a very short delay-- 0-1ms.

For the second optimization, it seems to me that a fixed frame-interval
time guarantees an average delay of half that interval for the last
frame update completing the screen update.  I would guess that most
applications will complete their screen updates in a relatively small
number of temporally-discrete blobs of I/O.  So if we can get shorter
frame intervals for a little while, we can improve the responsiveness
here.  I think of 10ms being a reasonable goal for this point.

For the last goal, 100-1000ms seems a reasonable goal to me.

I think maybe these last two goals can be combined by keeping a history
of recent frame updates for the last 1 or 2 seconds, and determining the
next frame interval based on that history.  If the history has no frame
updates in it, then frames can go out immediately or nearly so.  As it
accumulates more frames, you increase the interval so that it converge
on a desired steady-state rate with unconstrained output.  However,
coming up with an algorithm that does this automatically and is stable
isn't trivial-- I slept on it and haven't got a good answer yet.

Some last points:  The amount of time Mosh takes to compute new frames
affects both latency and the timing calculations discussed here, and
that can be quite significant.  Also, the rate at which mosh-server can
accept characters from the client application affects latency,
calculations, *and* mosh-server's perception of whether reads from the
pty are adjacent or separate in time (if mosh-server takes 4ms to
process a pty read buffer, then application writes will of course appear
to be 4ms apart).  My performance patches make significant improvements
in both these areas.  I hope to get those checked in fairly soon;
they'll make a good base for the work discussed here.

regards,

  --jh