Unicode and APIs
raeburn at MIT.EDU
Wed Sep 19 19:55:10 EDT 2007
On Sep 19, 2007, at 17:59, Sam Hartman wrote:
>>> How badly will things break if we add a context flag that says
>>> "everything is UTF8" and minimize other API changes?
> Jeffrey> That would certainly work for the interface between the
> Jeffrey> MIT library and the calling application.
> Let's focus on the API issue for now. Ken disagrees with you or at
> least implied he did by saying that it would be hard. Let's see what
> his concerns are.
Mostly, I was just thinking about the ideas kicked around for adding
*_utf8 versions of a bunch of GSSAPI functions. For krb5 only, yeah,
we could probably set another context flag, but I think it would go
onto my list of "things I'd like to see fixed if we ever get to
rework the API in a big way". :) Long term, I think what I'd rather
see would be us just using UTF-8, and maybe having a hook for just-
send-8 (JS8 for short, below). In 5-10 years I'd rather have an API
that efficiently deals with the environment we're using then, rather
than having to default to expecting 20th-century environments and
then calling routines to upgrade from that.
The context flag might be a good start, but how much would that help
with issues I brought up like needing to try a password as JS8 and as
UTF-8? Do we really want an application to have to create two
contexts? Or do we switch UTF-8 mode off and on?
What would code trying to get initial credentials look like, then?
convert princname_local to princname_utf8
convert password_local to password_utf8
enable utf-8 in context
get_init_creds (princname_utf8, password_utf8)
if success then return success
if local == utf-8 then return error
if princname_local == princname_utf8 && password_local ==
password_utf8 then return error
# Okay, maybe the password was set in just-send-8 mode.
disable utf-8 in context
get_init_creds (princname_local, password_local)
And if we let get_init_creds prompt for the password, the user will
get prompted twice. (Does the context flag mean any input we get is
going to be UTF-8? Do the prompter functions we supply have to do
conversions?) That's all well and fine for minimizing our API
changes, but for the application programmer and user, it kind of
sucks. (Obviously, that's just a rough outline. If the first error
is one indicating the principal exists but the password isn't right,
we can probably assume the UTF-8 form of the principal name is the
correct one and the other won't be found. Or can we?)
Can we wind up with a JS8-encoded principal name using a UTF-8
password, or vice versa? If so, we're looking at more calls from the
application, to get the principal name encoding right and to get the
password encoding right, somewhat independently, and the single flag
in the context doesn't make as much sense.
For more friendly compatibility, I think we'd want a function that
takes both forms (of both strings) and does the retrying as
necessary, internally. Or a callback function to do the conversions.
On the other end, krb5_rd_req gives the receiver the principal name
from the request; that'll presumably be in the form it was sent on
the wire? Which means regardless of context setting, it could be
either UTF-8 or JS8. Do we need to return to the application an
indication as to which principal name form was correct? If not, how
can it properly check against an ACL file like .k5login, presumably
maintained with one consistent encoding?
Should ktutil work in UTF-8 or JS8 mode for names and passwords? New
command-line switch? Okay, that's not an API issue... but we might
want to change ktutil to talk to the KDC to figure out the answer, if
we want to be friendly about it, rather than requiring the local
admin to know the answer beforehand.
What about non-ASCII data in the config files -- UTF-8 or local
encoding? They're read at init_context time; you don't get to set a
flag first unless we create a new init_context variant. (And an
init_secure_context variant, and an internal init_context_kdc
variant.) So there's another API change, unless we dictate terms. I
suppose we could extend the parsing code to let the config file say,
"I'm in UTF-8", but it would need to be backwards-compatible
syntactically, and that doesn't play nicely with the API functions
that return all matches against a list of names from the full set of
config files, without associated UTF-8-ness flags for each response.
And I'm not even thinking now about what happens if we try using
GSSAPI in an environment where we've starting mixing in UTF-8 with
local encoding at the Kerberos layer.
> Jeffrey> I think the challenge will be credential caches, keytabs,
> Jeffrey> replay caches, etc. Those are resources which are shared
> Jeffrey> with other Kerberos implementations that will not
> Jeffrey> necessarily be happy if the character sets changes.
True, but unless we've got places to stuff additional data ("oh, and
here's the UTF-8 version of the name"), we may just have to bite the
bullet and have a flag day or something.
More information about the krbdev