(Final?) krb5.Conf Lexer/Parser Proposal

Thu Jan 5 04:12:22 EST 2006

On Wed, Jan 04, 2006 at 05:46:34PM -0500, Joseph Calzaretta wrote:
>    - the "final" signifier (asterisk '*') is no longer supported or 
> treated as special.

You understand why the "final" signifier was implemented, right?  The
idea was to be able to provide the ability to have a user's ~/.krb5rc
override portions of /etc/krb5.conf.  So if KRB5_CONFIG is set to
~/.krb5rc:/etc/krb5.conf, then it is possible to override an entire
stanza by using the finalizer.  

i.e., if ~/.krb5.rc contains:

[realms]
	ATHENA.MIT.EDU = {
		kdc = extra_kdc.mit.edu:88
	}*

Then the only KDC that will be returned will be extra_kdc.mit.edu:88;
without the finalizer, then the list of kdc's returned will be...

% ./test_profile ~/.krb5rc:/etc/krb5.conf query realms ATHENA.MIT.EDU kdc
extra_kdc.mit.edu:88
kerberos.mit.edu:88
kerberos-1.mit.edu:88
kerberos-2.mit.edu:88
kerberos-3.mit.edu:88

Is anyone actually using this?  I can't answer that question; but it
is useful functionality.  Whether or not it is worth preserving I will
leave to others to decide, but in case it wasn't understood why it was
originally added, this might be useful historical perspective.

						- Ted

> 
> The new lexer/parser proposal embraces line-based syntax and is much 
> less of a change from the existing parser.  As a first order 
> approximation, the new proposal is the same as the existing parser except that:
>    - comments beginning with the pound sign '#' may appear at the end 
> of any line.
>    - quoted strings are acceptable everywhere, not just in relation 
> values.  Consequently, all section, tag, and value names can contain 
> any characters.
> 
> Therefore, please feel free to skip or skim the nitty-gritty part 
> below, and look at the Noteworthy Features section for interesting 
> changes.  If you have any concerns, questions, or (non-xml-themed) 
> suggestions, please let me know.  Thanks!
> 
> Yours,
> 
> Joe Calzaretta
> Software Development & Integration Team
> MIT Information Services & Technology
> 
> -----------------
> The New krb5.conf Lexer/Parser Proposal, a.k.a., the Nitty-Gritty Part:
> 
> The input file is divided into lines, terminated by linebreaks (or 
> end-of-stream for the last line.)
> 
> For each line:
>    initial whitespace is ignored.
>    all text after an unquoted pound sign '#' (including the pound 
> sign itself) is stored as comment text and subsequently ignored.
>    if the first nonwhitespace character is a semicolon ';', the 
> entire line is stored as comment text and subsequently ignored.
>    if the line is all whitespace, the entire line is ignored.
>    At this point, whitespace, comments, and blank lines are stripped.
> 
> After this processing, the first character of the line is examined.
>    If this character is an open square bracket '[', the line is a Section Line.
>    If this character is a close curly brace '}', the line is a 
> CloseSubsection Line.
>    If this character is an open curly brace '{', AND the previous 
> line was a DanglingSubsection line (more later), the line is a 
> RescuedSubsection Line.
>    Otherwise, the line is a SubsectionOrRelation line.
> 
> Section Lines:
>    All text after the initial open square bracket '[' and up to but 
> not including the first unquoted close square bracket ']' is 
> considered the Raw Section Name.  A line without the close bracket or 
> with any nonwhitespace text after the close bracket is considered an error.
> 
> CloseSubsection Lines:
>    Any nonwhitespace text after the initial close curly brace '}' is 
> considered an error.
> 
> SubsectionOrRelation Lines:
>    All text up to but not including the first unquoted equal sign '=' 
> is considered the Raw Tag name.  A line without such an equal sign is 
> considered an error.
>    The first nonwhitespace character after such an equal sign '=' is 
> examined.
>      If this character is an open curly brace '{', the line is a 
> Subsection Line.
>      If there is no such character, the line is a DanglingSubsection Line.
>      Otherwise, the line is a Relation Line.
> 
> Subsection and RescuedSubsection Lines:
>    Any nonwhitespace text after the unquoted open curly brace '{' is 
> considered an error.
> 
> Relation Lines:
>     All text after the unquoted equal sign '=' is considered the Raw 
> Value Name.
> 
> Raw Name Canonicalization:
>    All Raw Section/Tag/Value Names are canonicalized thusly:
>      Any text within a quoted string is unescaped in the manner of 
> ANSI C (C90 spec).
> (e.g., "[\\Huh\x3F]" => [\Huh?])
>      All whitespace within a quoted string is preserved.
>      Whitespace between two quoted strings is eliminated. (provides string
> concatenation much like ANSI C)
>      Whitespace at the beginning and end of the Raw Name is eliminated.
>      All other whitespace (i.e., whitespace before or after an unquoted word)
> is condensed to single space (like 'collapse' whitespace handling in xml).
> 
> Lines may generally occur in any order, but some situations are 
> considered errors.  Errors occur:
>    If the first line is not a Section Line.
>    If a DanglingSubsection Line is not immediately followed by a 
> RescuedSubsection Line.
>    If a Section Line or the end-of-stream appears within a Subsection 
> (after fewer CloseSubsection lines than Subsection/RescuedSubsection Lines).
>    If a CloseSubsection Line appears outside of a Subsection (after 
> an equal number of CloseSubsection and Subsection/RescuedSubsection Lines).
> 
> ---------------------------------
> Noteworthy Features of the Proposed Lexer/Parser
> 
> => The asterisk '*' signifier for "final" lines is no longer 
> supported.  Asterisks are not considered special characters at 
> all.  If this is undesirable or surprising, please let me know.
> 
> => The semicolon ';' only signifies a comment at the beginning of a 
> line, whereas the pound sign '#' signifies a comment whenever it 
> appears unquoted.  Note the following lines:
>    # a comment
>    foo = bar # a comment
>    foo = "bar # NOT a comment"
>    ; a comment
>    foo = bar ; NOT a comment
> Why this difference?  Some existing krb5.conf files' relation values 
> (notably the auth_to_local value) may have unquoted semicolons ';' in 
> them.  As far as we have seen, no existing krb5.conf files' relation 
> values or tags use unquoted pound signs '#'.  If this is untrue, 
> please let me know.
> 
> => Relation values may not start with an unquoted open curly brace 
> '{'.  For example, the line:
>    foo = { bar
> is considered an error.  Note that the existing parser would treat 
> this as a relation assigning the value "{ bar" to the tag "foo".  The 
> existing parser's behavior is confusing enough that it is probably 
> best discarded.  If this is untrue, please let me know.
> 
> => DanglingSubsections and RescuedSubsections:  The existing parser 
> allows the open curly brace '{' for subsections to appear on the line 
> after the equal sign '=', like so:
>    foo =              # dangling subsection
>    {                     # rescued subsection, yay!
>       bar = baz
>    }
> This syntax is pretty, because you can line up the open and close 
> curlies.  But it violates the one-line-per-element 
> linebreaks-are-syntactic rule to which the parser otherwise strictly 
> adheres.  Personally, I don't like this because it is an extra layer 
> of complexity, and the corresponding format for relation values is not valid:
>    foo =   # dangling relation?
>    bar     # can't be rescued, and doesn't parse.  boo!
> Anyway, this syntax continues to be supported in the new proposal 
> (actually improved because the existing parser doesn't allow comments 
> to appear on lines between the equal sign and the open curly 
> brace).  If anyone thinks this should be eliminated or supported 
> differently, please let me know.
> 
> => All section names, tag names, and value names may contain any 
> character (except for null '\0') including whitespace.  Since all 
> such names also support ANSI C quoted strings, there is a way to 
> include any special character.  For example,
>    ["foo]"] # close bracket in a section name
>    "} foo" = bar # close curly brace at start of a tag name.
>    "foo " = bar # space at the end of a tag name
>    foo     bar = baz #single collapsed space in the middle of a tag name.
>    "foo=" = bar # equal sign in a tag name.
>    "#foo" = bar #pound sign in a tag name.
>    foo = "{ bar" #open curly brace at start of a value name.
>    foo = "\"bar\"" #quotation marks in a value name.
>    foo = "\x3F" #raw byte value for a question mark.  Iffy!
> Note that some of these would be errors in the existing parser, while 
> others would be interpreted much differently.  Also note that the 
> allowing of "\xhh" and "\ooo" byte codes can get a bit iffy in the 
> future.  Right now, data is just stored as null-terminated byte 
> strings, with no guaranteed interpretation of codes outside the 7-bit 
> ASCII range.  We are planning to eventually use UTF-8 as the internal 
> representation of data.  If your krb5.conf file is in UTF-8, and the 
> byte codes specified either in raw form or via "\xhh" encoding are 
> UTF-8, you will probably not see surprising behavior.  Other 
> encodings may be unhappy when using byte codes.  If any of these are 
> surprising or seem wrong, please let me know.
> 
> => Error Recovery: In all error cases, reasonable recovery steps can 
> be taken to continue parsing.  For example, if the first line is not 
> a Section Line, it can be treated as a comment line.  As another 
> example, if a Section Line does not contain an unquoted close square 
> bracket ']', the parser can pretend that one exists.  Thus
>    [foo  #whoops
> can be interpreted as
>    [foo ] #whoops
> The default parser behavior is to note the error and perform error 
> recovery.  Thus a tree will always be produced, regardless of syntax 
> errors.   When the parser returns, it returns the tree as well as the 
> list of errors.  The calling function can decide whether the errors 
> are fatal or ignorable.  This allows the existing API to be 
> implemented (most errors are fatal), as well as more flexible parse 
> functions which allow certain classes of error.  I can, upon request, 
> talk about the specific recovery steps planned for each of the error 
> cases.        (i.e., please let me know).
> 
> --------------------
> 
> Whew, that's it!  Mostly.  I have not specified here how comments are 
> attached into the tree, although the existing API ignores comments 
> anyway.  And I'm sure there are other issues I haven't touched on, so 
> if you have questions... you know.  Thanks for your time and patience 
> if you've read all this!  :-)
> 
> --Joe
> 
> _______________________________________________
> krbdev mailing list             krbdev at mit.edu
> https://mailman.mit.edu/mailman/listinfo/krbdev