[mosh-devel] mosh bug without UTF-8
Keith Winstein
keithw at cs.stanford.edu
Tue Sep 7 14:01:02 EDT 2021
Hello,
Thanks for your email. As I wrote when you came on the IRC channel, we
don't think this behavior is a bug. The reason Mosh requires a UTF-8 native
locale has to do with terminal emulation. ANSI-style terminals operate on
an underlying state machine (https://vt100.net/emu/dec_ansi_parser).
Historically, the underlying symbols to the state machine can be 7-bit
ASCII or 8-bit octets (as was common in the case of physical terminals like
the vt220), or they can be Unicode scalar values, aka "USVs."
The underlying symbol type (especially the question of octet vs. USVs)
affects the lowest layer of the terminal emulator. For example, the C1
controls have the values 0x80..0x9f (per ISO/IEC 6429 and ECMA-48 as well
as ISO/IEC 10646), and then each one has a corresponding 7-bit
compatibility encoding. So, for example, the C1 control CSI has the value
0x9b, with a 7-bit encoding as <0x1b, 0x5b>.
On an 8-bit terminal (e.g. xterm when invoked in a "C" locale), you can run
this command to get a green "Hello": echo -e "\x9b32m Hello \x9bm"
But the same command just produces mojibake on a UTF-8 terminal emulator,
because the terminal emulator is *first* parsing the incoming bytes (as
UTF-8) into USVs and only then looking at their values; it never enters the
csi_entry state in the state transition diagram. The fact that Unicode
terminal emulators would work this way was made around 2003 or earlier; we
could argue about whether they made a wise choice back then, but this has
been the reality for almost 20 years (
https://www.cl.cam.ac.uk/~mgk25/unicode.html#term).
To get the same results on a UTF-8 terminal emulator, you could pipe that
command through a charset conversion, e.g.: echo -e "\x9b32m Hello \x9bm" |
iconv -f LATIN1 -t UTF8
or, equivalently: echo -e "\xc2\x9b32m Hello \xc2\x9bm"
or, using the 7-bit encoding (which works everywhere because 7-bit ASCII
has the same encoding in UTF-8): echo -e "\x1b\x5b32m Hello \x1b\x5bm"
Anyway, these kinds of issues can lead to a lot of subtle bugs and
glitches, especially when connecting across incompatible systems, e.g. when
applications think they can send 8-bit (both for controls and printable
characters!) but the terminal emulator at the other side is interpreting it
as UTF-8 or vice versa. You say that ssh/telnet/rlogin/rsh don't have a
problem, but of course these programs aren't terminal emulators, and they
do *allow* charset incompabilities to exist and to flow through them, and
an ssh/telnet user can absolutely see glitches when using an 8-bit
application with a UTF-8 terminal emulator or vice versa. Programs like
screen and tmux (which Mosh is maybe more similar to when it comes to this
kind of thing) also have to worry about these problems.
Ten years ago, I decided that to keep my sanity, Mosh would only emulate
ONE kind of terminal emulator (and the one I picked was the kind whose
underlying symbols are USVs, parsed out of the incoming bytestream by
interpreting it as UTF-8). In practice, getting access to UTF-8 parsing
from libc requires being in a UTF-8 locale (and obviously you do also want
applications to be in the same charset). I used to feel, like you do, that
UTF-8 in Unix was needless complexity, but that was like 20 years ago --
I'm sorry to tell you the world moved on and that battle was lost by like
2005. Contrary to your message, UTF-8 locales and terminal emulators have
been the default for new Unix-like installations for roughly 15+ years (I
think it's the default on installation, and was around then, for most Linux
distributions, most flavors of BSD, MacOS, WSL, etc., and probably most
other things). We wanted to pick one horse, and essentially we made the
choice a decade ago that we'd rather enforce the expected preconditions
before letting Mosh start up, instead of allowing the user to run Mosh in a
situation where we'd have to support bug reports about subtle hard-to-debug
issues, knowing that people like you would complain (as many predecessors
have!) in the support forums.
To tell you the truth, I haven't even thought about this stuff for a long
time -- we made these choices like a decade ago, they have been good enough
for some millions of users, and obviously Mosh isn't for everybody. I had
to go back and refresh my memory even to write this email. But I think
we're pretty happy with the choice we made ("pedantic correctness") at
least in this aspect; the spirit is maybe similar to
https://www.jwz.org/gruntle/autobogotification.html , and you can see some
of that on the Mosh website as well where we quote the USENIX peer
reviewers; see also https://github.com/mobile-shell/mosh/issues/1112.
Compared with a lot of contemporary issues in computing, at least this one
sort of feels like you can get your head around it...
Best regards,
Keith
On Tue, Sep 7, 2021 at 6:55 AM Thomas Nachname <
onlywebmail.2011 at googlemail.com> wrote:
> Hello, together
> Is the any chance, that you will fix this bug with the error message
> "mosh-client needs a UTF-8 native locale to run."?
>
> All Unix like systems I know and use since over 30 years are always set to
> POSIX / C!
> And all of them simply do not work with mosh, which is a massive shame.
>
> mosh needs to work with normal Unix systems, where LANG is unset and
> LC_ALL is set to C or POSIX, or unset too.
> No sane systems uses UTF-8 ... this may change for some Applications, that
> are running, but not for regular remote session. Such settings are only
> ever required for certain applications! Never for a remote connection...
>
> Cannot be SUCH hard, or?
>
> P.S. And no, ssh or telnet or rlogin or rsh never ever had a problem with
> this configuration, not on one system based on IRIX, HP-UX, True64, SunOS,
> Solaris, AIX or Linux from most different vendors.
>
>
> _______________________________________________
> mosh-devel mailing list
> mosh-devel at mit.edu
> http://mailman.mit.edu/mailman/listinfo/mosh-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/mosh-devel/attachments/20210907/e9898f25/attachment.html
More information about the mosh-devel
mailing list