proflib: krb5.conf lexer proposal
Joseph Calzaretta
saltine at MIT.EDU
Mon Nov 28 09:48:39 EST 2005
Now that America's extended harvest holiday has ended, I'd like to pick up
this thread. Here is a summary of where we are at this point, in my opinion.
New Lexer for krb5.conf Files
The existing profile library has issues which warrant rewriting it. This
is an opportunity to improve the parser to handle input in a more intuitive
and general way (e.g., allowing comments within curly braces, allowing the
specification of arbitrary character sequences as section names, relation
tags, and relation values, etc.) while still maintaining backwards
compatibility with existing krb5.conf files.
My lexer proposal of Nov 21 (
http://mailman.mit.edu/pipermail/krbdev/2005-November/003892.html ) may be
a component of such an improved parser. Maybe it's not. Any change made
to the parser's behavior may result in -someone's- krb5.conf file suddenly
being interpreted differently, and therefore incorrectly (from the point of
view of that person). The proposed lexer might be acceptable, and only
cause problems with pathologically misformatted files that we don't intend
to support. Or it might be too ambitious and break or risk breaking too
many existing systems. My main purpose in presenting that lexer is to find
out your opinions on whether or not this lexer is acceptable or too ambitious.
In my opinion, the main issue centers around what possible characters
appear, unquoted, as relation values (text tokens after the '=') in
supported krb5.conf files. For example, if the string "foo[bar]" appears
as an unquoted relation value, (baz = foo[bar] as opposed to baz =
"foo[bar]") then the lexer must allow for the characters '[' and ']' to
appear unquoted in text tokens, at least in certain circumstances. From my
research, it seems that most relation values consist of alphanumeric
characters plus dashes, underscores, and single spaces. However, the
auth_to_local relation values seem to contain '[', ']', and ';' characters,
as well as some other punctuation marks I'm not too concerned about because
I treat them as text anyway.
The proposed lexer assumes that text tokens do not contain (unquoted):
'=' (equal signs)
'{' (open curly braces)
'}' (close curly braces)
'[' (open square brackets) preceded by whitespace. (So "foo[bar" can be
a text token but "foo [bar" is not)
']' (close square brackets) followed by whitepace. (So "foo]bar" can be
a text token but "foo] bar" is not)
'#' (hash/pound signs) preceded by whitespace. (So "foo#bar" can be a
text token but "foo #bar" is not)
';' (semicolons) preceded by whitespace. (So "foo;bar" can be a text
token but "foo ;bar" is not)
If these assumptions are invalid, or if there are other concerns about how
the lexer tokenizes krb5.conf files, please let me know. Thanks!
Yours,
Joe Calzaretta
Software Development & Integration Team
MIT Information Services & Technology
More information about the krbdev
mailing list