Unicode and APIs

Wed Sep 19 17:15:20 EDT 2007

My experience is that most of the core client side code works fine with
UTF8 - since this format is designed to work in a code space that
expects char* strings. Just pass in the utf8 to the char* params and
they get moved through the system fine. Remember that an ascii string
*is* UTF8 (just a limited subset of the char set). So flagging a string
as UTF8 is not really needed (what would u do different if you knew the
string was UTF8)

Things I know are broken:-

rc4-hmac wants UCS2LE in its s2k code and so you must do this conversion
in the rc4 code. The current rc4 code does this wrong, it does not do a
utf8 to ucs conversion it does a simple pad with extra 00 byte so any
chars outside the normal ascii range will be broken. This is probably
what was failing in the prior email exchange from Xu Qiang

I know that the rest works OK though - I used to have my own rc4-hmac
implementation (before the standard dist had it) and that did UTF8 to
UCS2LE conversion and that worked fine with japanese users and passwords
- my s2k function is at the end of this email

Des has a totally different issue. The original 1510 spec was silent on
non ascii characters. So different implementation did different things.
MS passed the unicode string through their UnicodeToOEMString function
this tries to convert the string to 8-bit ascii using whatever code page
is the default on the running system. This results in odd things: chars
that don't exist in the 8-bit space end up as '?', other 'european-like'
chars end up mapped but the other side of the conversation has to know
the conversion that was performed. Since it does not know the default
OEM code page on the KDC it cannot know - it can guess and it might be
right sometimes but usually isnt. This is not something that can be
easily fixed

I am not sure about the des salt story if there are non asci chars in
the user name

Obvioulsy  if you want to internationlize the UI that's a whole
different issue

------------------------------

krb5_error_code krb5_arc4_string_to_key(
	krb5_const struct krb5_enc_provider *enc,
	krb5_const krb5_data *string,
	krb5_const krb5_data *salt,
	krb5_keyblock *key)
{
/*
  	 String2Key(password) 
	 {

	        K = MD4(UNICODE(password)) 
	}

   The RC4-HMAC keys are generated by using the Windows UNICODE version 
   of the password. Each Windows UNICODE character is encoded in 
   little-endian format of 2 octets each. Then performing an MD4 [6] 
   hash operation on just the UNICODE characters of the password (not 
   including the terminating zero octets).

*/
	krb5_data unipw;
	krb5_data hashout;
	int ret;
	ASSERT(key->length = RSA_MD4_CKSUM_LENGTH);
	ASSERT(key->contents != NULL);
--->	ret = UTF8ToUnicode(string->data, string->length, (unsigned
char**)&(unipw.data), &(unipw.length));
	if (ret != 0)
		return ENOMEM;
	unipw.length *= 2; /* the length is returned as number of 16-bit
chars */
	hashout.data = key->contents;
	hashout.length = key->length;
	(*krb5_hash_md4.hash)(1, &unipw, &hashout);
	free(unipw.data);
 	return 0;
}