non-ascii password in kerberos authentication

Tue Oct 30 23:00:00 EDT 2007

Hi, all: 

Previously, when I found the input from UI is 0xA4 (hex value for currency sign in ISO-8859-1), I just forcefully change it to 0x80 (hex value for euro symbol in most Windows system). Because it is still 1-byte char, I expected it to work for the password submitted to the kerberos authentication server. 

But it doesn't. 

Now my collegue, Suhuang, has helped me find the reason for its failure. Just as Ken said:
====================================================
However, the RC4 cryptosystem spec (RFC 4757) wants the password
converted to UCS-2LE (like UTF-16LE but limited to 0..65535), and the
MIT code just alternates bytes from the input string with zero bytes,
which is correct for ASCII or ISO-8859-1 input, but other input,
including in UTF-8 form, won't be converted properly.
====================================================

For other characters (including the currency sign 0xA4) in ISO-8859-1 character set, the value after appending zero byte to the original one is exactly the hex value in its UCS-2LE set. For example, the copyright sign in ISO-8859-1 is 0xA9, when it is submitted to the server which with RC4 algorithm, MIT code will add 0x00 to it, make it become 0x00A9, and this is exactly what the value for the copyright sign in UCS-2LE. 

However, for the euro sign, the local input from the printer is 0xA4 (actually it is currency sign), after we forcefully change it to 0x80 and submit it to the server, MIT code will change it to 0x0080, but there is no character correpsonding to this hex value in UCS-2LE. The hex value for euro symbol in UCS-2LE is 0x20AC. So the submitted password containing the euro sign always can't match that in the server. 

I want to know that for this euro symbol, is there any possible walkaround? For example, when krb5 code detects there is a euro sign (0x80), the code changes it to 0x20AC, rather than appends 0x00 to 0x80. Is it feasible? After all, the euro symbol is quite special, and most Windows system implements it as 0x80 in its WinLatin-1 set. (In ISO-8859-15/Latin-9 set, the euro sign is supposed to replace the currency sign, but alas, Windows doesn't follow the standard, obviously.

Hope you guys can give some suggestions, 
Xu Qiang

> -----Original Message-----
> From: krbdev-bounces at mit.edu 
> [mailto:krbdev-bounces at mit.edu]On Behalf Of Xu Qiang
> Sent: Monday, October 22, 2007 6:17 PM
> To: Ken Raeburn; Paul Moore
> Cc: krbdev at mit.edu
> Subject: RE: non-ascii password in kerberos authentication
> 
> Ken,
> 
> Based on your suggestions, I have successfully authenticate 
> passwords containing
> accented characters by not converting them to UTF-8. Although 
> they are accented
> characters, they are still in the character set ISO-8859-1, 
> so no problem in submitting
> them unchanged.
> 
> However, if the password contains the euro symbol, which is 
> not in ISO-8859-1, and in
> ISO-8859-9 set, it replaces the currency sign 0xA4. But in 
> most window-based system,
> euro symbol is added as 0x80 in win-latin1 set, whereas the 
> currency sign is still kept
> unchanged.
> 
> In this case, the authentication fails when the password is 
> submitted to the kerberos
> server, whether they are the primitive form or converted to 
> UTF-8 first.
> 
> Any special handling is needed for this euro symbol?
> 
> TIA,
> Xu Qiang