<div dir="ltr"><div>Hello,</div><div><br></div><div>Thanks for your email. As I wrote when you came on the IRC channel, we don't think this behavior is a bug. The reason Mosh requires a UTF-8 native locale has to do with terminal emulation. ANSI-style terminals operate on an underlying state machine (<a href="https://vt100.net/emu/dec_ansi_parser" target="_blank">https://vt100.net/emu/dec_ansi_parser</a>). Historically, the underlying symbols to the state machine can be 7-bit ASCII or 8-bit octets (as was common in the case of physical terminals like the vt220), or they can be Unicode scalar values, aka "USVs."</div><div><br></div><div>The underlying symbol type (especially the question of octet vs. USVs) affects the lowest layer of the terminal emulator. For example, the C1 controls have the values 0x80..0x9f (per ISO/IEC 6429 and ECMA-48 as well as ISO/IEC 10646), and then each one has a corresponding 7-bit compatibility encoding. So, for example, the C1 control CSI has the value 0x9b, with a 7-bit encoding as <0x1b, 0x5b>.</div><div><br></div><div>On an 8-bit terminal (e.g. xterm when invoked in a "C" locale), you can run this command to get a green "Hello": <span style="font-family:monospace">echo -e "\x9b32m Hello \x9bm"</span></div><div><br></div><div>But the same command just produces mojibake on a UTF-8 terminal emulator, because the terminal emulator is <b>first</b> parsing the incoming bytes (as UTF-8) into USVs and only then looking at their values; it never enters the <span style="font-family:monospace">csi_entry</span> state in the state transition diagram. The fact that Unicode terminal emulators would work this way was made around 2003 or earlier; we could argue about whether they made a wise choice back then, but this has been the reality for almost 20 years (<a href="https://www.cl.cam.ac.uk/~mgk25/unicode.html#term" target="_blank">https://www.cl.cam.ac.uk/~mgk25/unicode.html#term</a>).</div><div><br></div><div>To get the same results on a UTF-8 terminal emulator, you could pipe that command through a charset conversion, e.g.: <span style="font-family:monospace">echo -e "\x9b32m Hello \x9bm" | iconv -f LATIN1 -t UTF8</span></div><div><br></div><div>or, equivalently: <span style="font-family:monospace">echo -e "\xc2\x9b32m Hello \xc2\x9bm"</span></div><div><br></div><div>or, using the 7-bit encoding (which works everywhere because 7-bit ASCII has the same encoding in UTF-8): <span style="font-family:monospace">echo -e "\x1b\x5b32m Hello \x1b\x5bm"</span></div><div><span style="font-family:monospace"><br></span></div><div><span style="font-family:monospace"><font face="arial,sans-serif">Anyway, these kinds of issues can lead to a lot of subtle bugs and glitches, especially when connecting across incompatible systems, e.g. when applications think they can send 8-bit (both for controls and printable characters!) but the terminal emulator at the other side is interpreting it as UTF-8 or vice versa. You say that ssh/telnet/rlogin/rsh don't have a problem, but of course these programs aren't terminal emulators, and they do *allow* charset incompabilities to exist and to flow through them, and an ssh/telnet user can absolutely see glitches when using an 8-bit application with a UTF-8 terminal emulator or vice versa. Programs like screen and tmux (which Mosh is maybe more similar to when it comes to this kind of thing) also have to worry about these problems.</font></span></div><div><span style="font-family:monospace"><font face="arial,sans-serif"><br></font></span></div><div><span style="font-family:monospace"><font face="arial,sans-serif">Ten years ago, I decided that to keep my sanity, Mosh would only emulate ONE kind of terminal emulator (and the one I picked was the kind whose underlying symbols are USVs, parsed out of the incoming bytestream by interpreting it as UTF-8). In practice, getting access to UTF-8 parsing from libc requires being in a UTF-8 locale (and obviously you do also want applications to be in the same charset). <span style="font-family:monospace"><font face="arial,sans-serif">I used
to feel, like you do, that UTF-8 in Unix was needless complexity, but that was like 20 years ago -- I'm sorry
to tell you the world moved on and that battle was lost by like 2005. </font></span>Contrary to your message, UTF-8 locales and terminal emulators have been the default for new Unix-like installations for roughly 15+ years (I think it's the default on installation, and was around then, for most Linux distributions, most flavors of BSD, MacOS, WSL, etc., and probably most other things). We wanted to pick one horse, and e</font><font face="arial,sans-serif">ssentially we made the choice a decade ago that we'd rather enforce the expected preconditions before letting Mosh start up, instead of allowing the user to run Mosh in a situation where we'd have to support bug reports about subtle hard-to-debug issues, knowing that people like you would complain (as many predecessors have!) in the support forums. <br></font></span></div><div><span style="font-family:monospace"><br></span></div><div><span style="font-family:monospace"><span style="font-family:arial,sans-serif">To tell you the truth, I haven't even thought about this stuff for a long time -- we made these choices like a decade ago, they have been good enough for some millions of users, and obviously Mosh isn't for everybody. I had to go back and refresh my memory even to write this email. But I think we're pretty happy with the choice we made ("pedantic correctness") at least in this aspect; </span><span style="font-family:monospace"><span style="font-family:arial,sans-serif">the spirit is maybe similar to <span style="font-family:monospace"><a href="https://www.jwz.org/gruntle/autobogotification.html" target="_blank">https://www.jwz.org/gruntle/autobogotification.html</a></span> , and you can see some of that on the Mosh website as well where we quote the USENIX peer reviewers; see also <a href="https://github.com/mobile-shell/mosh/issues/1112" target="_blank">https://github.com/mobile-shell/mosh/issues/1112</a>. Compared with a lot of contemporary issues in computing, at least this one sort of feels like you can get your head around it...</span><br></span></span></div><div><span style="font-family:monospace"><br></span></div><div><span style="font-family:monospace"><font face="arial,sans-serif">Best regards,</font></span></div><div><span style="font-family:monospace"><font face="arial,sans-serif">Keith</font><br></span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 7, 2021 at 6:55 AM Thomas Nachname <<a href="mailto:onlywebmail.2011@googlemail.com" target="_blank">onlywebmail.2011@googlemail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello, together<br>
Is the any chance, that you will fix this bug with the error message "mosh-client needs a UTF-8 native locale to run."?<br>
<br>
All Unix like systems I know and use since over 30 years are always set to POSIX / C!<br>
And all of them simply do not work with mosh, which is a massive shame.<br>
<br>
mosh needs to work with normal Unix systems, where LANG is unset and LC_ALL is set to C or POSIX, or unset too.<br>
No sane systems uses UTF-8 ... this may change for some Applications, that are running, but not for regular remote session. Such settings are only ever required for certain applications! Never for a remote connection...<br>
<br>
Cannot be SUCH hard, or?<br>
<br>
P.S. And no, ssh or telnet or rlogin or rsh never ever had a problem with this configuration, not on one system based on IRIX, HP-UX, True64, SunOS, Solaris, AIX or Linux from most different vendors.<br>
<br>
<br>
_______________________________________________<br>
mosh-devel mailing list<br>
<a href="mailto:mosh-devel@mit.edu" target="_blank">mosh-devel@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/mosh-devel" rel="noreferrer" target="_blank">http://mailman.mit.edu/mailman/listinfo/mosh-devel</a><br>
</blockquote></div>