[mosh-devel] mosh and IUTF8

Keith Winstein keithw at MIT.EDU
Fri Apr 13 20:19:24 EDT 2012


Hi Matthew,

Thanks for your detailed email. Of course you're right that IUTF8 (as
currently implemented by Mac OS X and Linux) doesn't send two
backspaces when deleting a double-wide character, and this does lead
to display glitches on the UTF-8 terminal emulators (xterm,
gnome-terminal, mosh...).

IUTF8 does still save us from generating ill-formed UTF-8 sequences on
the file descriptor, which you would otherwise get, but what you're
talking about is a genuine display bug.

I'm a little less confident that the kernel is "supposed" to send two
backspaces in this case, since it's not like there is any spec for
double-wide characters in an ECMA-48 terminal. To the best of my
knowledge, ECMA-48 does not cover this, except to say that backspace
moves the cursor "one character position" backwards. And as for
deleting combining characters, definitely there is no authority here.

If OpenBSD wants to go a different and more correct direction than
what OS X and Linux have done, I would say that is totally their
decision. But I don't understand your proposed solution. Do I
understand correctly that to make that work, you would have to modify
(a) every terminal emulator in existence [xterm, gnome-terminal,
screen, tmux, Terminal.app, PuTTY] as well as (b) every remote login
protocol [ssh, telnet, rlogin, mosh] to convey this switch to
canonical mode over the connection to the terminal emulator?

I must have misunderstood because that seems totally impossible. If
you're going to change all the terminal emulators, you might as well
just declare that the true meaning of backspace IS to move "one
character position [possible two columns]" to the left and be done
with it.

Bottom line: Whatever mechanism OpenBSD comes up with, just let us
know and mosh will be happy to set whatever flag we need on the server
side.

Cheers, and thanks for writing,
Keith

On Fri, Apr 13, 2012 at 3:40 PM, Matthew Dempsky <matthew at dempsky.org> wrote:
> Hi Keith and Hari,
>
> I was looking at mosh yesterday, and I think it's pretty interesting.
> Good work!
>
> While reading your paper, I learned about Linux's IUTF8 and looked
> into implementing it for OpenBSD.  However, after digging into it some
> more, I'm suspecting that it's the wrong solution.
>
> IUTF8 means that canonical mode never outputs an invalid UTF8 byte
> sequence (assuming a valid input sequence to start from), but without
> adding tremendous complexity to the kernel you can't guarantee that it
> will echo the correct backspace sequences for updating the screen.
> For example, if the character being backspaced over visually spans two
> columns, then the kernel is supposed to backspace two columns rather
> than just one.  Similarly, if the character being backspace over has
> combining marks, then the kernel needs to know to either remove
> multiple code points from the input or to backspace and re-output the
> base character sans combining marks.  (I'm not actually sure the
> expected behavior when you backspace a combined character; it seems to
> differ by application.)
>
> I think the correct thing to do is to essentially move canonical mode
> out of the kernel and into the terminal emulator.  I believe this
> should already be feasible with TIOCPKT and TIOCREMOTE.  TIOCPKT
> notifies the pty master with TIOCPKT_IOCTL whenever the child enables
> or disables ICANON, and TIOCREMOTE ensures writes always go directly
> to the client (i.e., bypassing kernel canonicalization) so that you
> can implement proper canonicalization yourself in the master (only
> when ICANON is actually enabled, of course).
>
> Oof, of course, it looks like Linux doesn't currently implement TIOCREMOTE. :(
>
> What do you think?
>
> Thanks!




More information about the mosh-devel mailing list