proflib: krb5.conf lexer proposal

Mon Nov 28 09:48:39 EST 2005

Now that America's extended harvest holiday has ended, I'd like to pick up 
this thread.  Here is a summary of where we are at this point, in my opinion.

New Lexer for krb5.conf Files

The existing profile library has issues which warrant rewriting it.  This 
is an opportunity to improve the parser to handle input in a more intuitive 
and general way (e.g., allowing comments within curly braces, allowing the 
specification of arbitrary character sequences as section names, relation 
tags, and relation values, etc.) while still maintaining backwards 
compatibility with existing krb5.conf files.

My lexer proposal of Nov 21 ( 
http://mailman.mit.edu/pipermail/krbdev/2005-November/003892.html ) may be 
a component of such an improved parser.  Maybe it's not.  Any change made 
to the parser's behavior may result in -someone's- krb5.conf file suddenly 
being interpreted differently, and therefore incorrectly (from the point of 
view of that person).   The proposed lexer might be acceptable, and only 
cause problems with pathologically misformatted files that we don't intend 
to support.  Or it might be too ambitious and break or risk breaking too 
many existing systems.  My main purpose in presenting that lexer is to find 
out your opinions on whether or not this lexer is acceptable or too ambitious.

In my opinion, the main issue centers around what possible characters 
appear, unquoted, as relation values (text tokens after the '=') in 
supported krb5.conf files.  For example, if the string "foo[bar]" appears 
as an unquoted relation value, (baz = foo[bar] as opposed to baz = 
"foo[bar]") then the lexer must allow for the characters '[' and ']' to 
appear unquoted in text tokens, at least in certain circumstances.  From my 
research, it seems that most relation values consist of alphanumeric 
characters plus dashes, underscores, and single spaces.  However, the 
auth_to_local relation values seem to contain '[', ']', and ';' characters, 
as well as some other punctuation marks I'm not too concerned about because 
I treat them as text anyway.

The proposed lexer assumes that text tokens do not contain (unquoted):
   '=' (equal signs)
   '{' (open curly braces)
   '}' (close curly braces)
   '[' (open square brackets) preceded by whitespace.  (So "foo[bar" can be 
a text token but "foo [bar" is not)
   ']' (close square brackets) followed by whitepace.  (So "foo]bar" can be 
a text token but "foo] bar" is not)
   '#' (hash/pound signs) preceded by whitespace.  (So "foo#bar" can be a 
text token but "foo #bar" is not)
   ';' (semicolons) preceded by whitespace. (So "foo;bar" can be a text 
token but "foo ;bar" is not)

If these assumptions are invalid, or if there are other concerns about how 
the lexer tokenizes krb5.conf files, please let me know.  Thanks!

Yours,

Joe Calzaretta
Software Development & Integration Team
MIT Information Services & Technology