Unicode and APIs

Thu Sep 20 00:27:09 EDT 2007

Thanks for you guys in discussing this issue and suggestion put forward. 

I just realized that for my case, the conversion from ISO-8859-1 (which is what our printer's Local UI can input) password to UCS-2LE is redundant, coz Ken has said:
=========================================================
However, the RC4 cryptosystem spec (RFC 4757) wants the password
converted to UCS-2LE (like UTF-16LE but limited to 0..65535), and the
MIT code just alternates bytes from the input string with zero bytes,
which is correct for ASCII or ISO-8859-1 input, but other input,
including in UTF-8 form, won't be converted properly.
=========================================================
The time when the conversion is needed is the input is from a character set beyond ISO-8599-1 (the range of which is 0x0000 to 0x00FF), in this case, Paul's fix would be much helpful.

Thanks again for you guys,
Xu Qiang

> -----Original Message-----
> From: krbdev-bounces at mit.edu 
> [mailto:krbdev-bounces at mit.edu]On Behalf Of Paul Moore
> Sent: Thursday, September 20, 2007 5:15 AM
> To: Sam Hartman; krbdev at mit.edu
> Subject: RE: Unicode and APIs
> 
> 
> My experience is that most of the core client side code works 
> fine with
> UTF8 - since this format is designed to work in a code space that
> expects char* strings. Just pass in the utf8 to the char* params and
> they get moved through the system fine. Remember that an ascii string
> *is* UTF8 (just a limited subset of the char set). So 
> flagging a string
> as UTF8 is not really needed (what would u do different if 
> you knew the
> string was UTF8)
> 
> Things I know are broken:-
> 
> rc4-hmac wants UCS2LE in its s2k code and so you must do this 
> conversion
> in the rc4 code. The current rc4 code does this wrong, it 
> does not do a
> utf8 to ucs conversion it does a simple pad with extra 00 byte so any
> chars outside the normal ascii range will be broken. This is probably
> what was failing in the prior email exchange from Xu Qiang
> 
> I know that the rest works OK though - I used to have my own rc4-hmac
> implementation (before the standard dist had it) and that did UTF8 to
> UCS2LE conversion and that worked fine with japanese users 
> and passwords
> - my s2k function is at the end of this email
> 
> Des has a totally different issue. The original 1510 spec was 
> silent on
> non ascii characters. So different implementation did 
> different things.
> MS passed the unicode string through their UnicodeToOEMString function
> this tries to convert the string to 8-bit ascii using 
> whatever code page
> is the default on the running system. This results in odd 
> things: chars
> that don't exist in the 8-bit space end up as '?', other 
> 'european-like'
> chars end up mapped but the other side of the conversation has to know
> the conversion that was performed. Since it does not know the default
> OEM code page on the KDC it cannot know - it can guess and it might be
> right sometimes but usually isnt. This is not something that can be
> easily fixed
> 
> I am not sure about the des salt story if there are non asci chars in
> the user name
> 
> 
> Obvioulsy  if you want to internationlize the UI that's a whole
> different issue
> 
> 
> ------------------------------
> 
> krb5_error_code krb5_arc4_string_to_key(
> 	krb5_const struct krb5_enc_provider *enc,
> 	krb5_const krb5_data *string,
> 	krb5_const krb5_data *salt,
> 	krb5_keyblock *key)
> {
> /*
>   	 String2Key(password)
> 	 {
>  
> 	        K = MD4(UNICODE(password))
> 	}
>  
>    The RC4-HMAC keys are generated by using the Windows 
> UNICODE version
>    of the password. Each Windows UNICODE character is encoded in
>    little-endian format of 2 octets each. Then performing an MD4 [6]
>    hash operation on just the UNICODE characters of the password (not
>    including the terminating zero octets).
> 
> */
> 	krb5_data unipw;
> 	krb5_data hashout;
> 	int ret;
> 	ASSERT(key->length = RSA_MD4_CKSUM_LENGTH);
> 	ASSERT(key->contents != NULL);
> --->	ret = UTF8ToUnicode(string->data, string->length, (unsigned
> char**)&(unipw.data), &(unipw.length));
> 	if (ret != 0)
> 		return ENOMEM;
> 	unipw.length *= 2; /* the length is returned as number of 16-bit
> chars */
> 	hashout.data = key->contents;
> 	hashout.length = key->length;
> 	(*krb5_hash_md4.hash)(1, &unipw, &hashout);
> 	free(unipw.data);
>  	return 0;
> }
> 
> 
> _______________________________________________
> krbdev mailing list             krbdev at mit.edu
> https://mailman.mit.edu/mailman/listinfo/krbdev
>