(Final?) krb5.Conf Lexer/Parser Proposal
Theodore Ts'o
tytso at MIT.EDU
Thu Jan 5 04:12:22 EST 2006
On Wed, Jan 04, 2006 at 05:46:34PM -0500, Joseph Calzaretta wrote:
> - the "final" signifier (asterisk '*') is no longer supported or
> treated as special.
You understand why the "final" signifier was implemented, right? The
idea was to be able to provide the ability to have a user's ~/.krb5rc
override portions of /etc/krb5.conf. So if KRB5_CONFIG is set to
~/.krb5rc:/etc/krb5.conf, then it is possible to override an entire
stanza by using the finalizer.
i.e., if ~/.krb5.rc contains:
[realms]
ATHENA.MIT.EDU = {
kdc = extra_kdc.mit.edu:88
}*
Then the only KDC that will be returned will be extra_kdc.mit.edu:88;
without the finalizer, then the list of kdc's returned will be...
% ./test_profile ~/.krb5rc:/etc/krb5.conf query realms ATHENA.MIT.EDU kdc
extra_kdc.mit.edu:88
kerberos.mit.edu:88
kerberos-1.mit.edu:88
kerberos-2.mit.edu:88
kerberos-3.mit.edu:88
Is anyone actually using this? I can't answer that question; but it
is useful functionality. Whether or not it is worth preserving I will
leave to others to decide, but in case it wasn't understood why it was
originally added, this might be useful historical perspective.
- Ted
>
> The new lexer/parser proposal embraces line-based syntax and is much
> less of a change from the existing parser. As a first order
> approximation, the new proposal is the same as the existing parser except that:
> - comments beginning with the pound sign '#' may appear at the end
> of any line.
> - quoted strings are acceptable everywhere, not just in relation
> values. Consequently, all section, tag, and value names can contain
> any characters.
>
> Therefore, please feel free to skip or skim the nitty-gritty part
> below, and look at the Noteworthy Features section for interesting
> changes. If you have any concerns, questions, or (non-xml-themed)
> suggestions, please let me know. Thanks!
>
> Yours,
>
> Joe Calzaretta
> Software Development & Integration Team
> MIT Information Services & Technology
>
> -----------------
> The New krb5.conf Lexer/Parser Proposal, a.k.a., the Nitty-Gritty Part:
>
> The input file is divided into lines, terminated by linebreaks (or
> end-of-stream for the last line.)
>
> For each line:
> initial whitespace is ignored.
> all text after an unquoted pound sign '#' (including the pound
> sign itself) is stored as comment text and subsequently ignored.
> if the first nonwhitespace character is a semicolon ';', the
> entire line is stored as comment text and subsequently ignored.
> if the line is all whitespace, the entire line is ignored.
> At this point, whitespace, comments, and blank lines are stripped.
>
> After this processing, the first character of the line is examined.
> If this character is an open square bracket '[', the line is a Section Line.
> If this character is a close curly brace '}', the line is a
> CloseSubsection Line.
> If this character is an open curly brace '{', AND the previous
> line was a DanglingSubsection line (more later), the line is a
> RescuedSubsection Line.
> Otherwise, the line is a SubsectionOrRelation line.
>
> Section Lines:
> All text after the initial open square bracket '[' and up to but
> not including the first unquoted close square bracket ']' is
> considered the Raw Section Name. A line without the close bracket or
> with any nonwhitespace text after the close bracket is considered an error.
>
> CloseSubsection Lines:
> Any nonwhitespace text after the initial close curly brace '}' is
> considered an error.
>
> SubsectionOrRelation Lines:
> All text up to but not including the first unquoted equal sign '='
> is considered the Raw Tag name. A line without such an equal sign is
> considered an error.
> The first nonwhitespace character after such an equal sign '=' is
> examined.
> If this character is an open curly brace '{', the line is a
> Subsection Line.
> If there is no such character, the line is a DanglingSubsection Line.
> Otherwise, the line is a Relation Line.
>
> Subsection and RescuedSubsection Lines:
> Any nonwhitespace text after the unquoted open curly brace '{' is
> considered an error.
>
> Relation Lines:
> All text after the unquoted equal sign '=' is considered the Raw
> Value Name.
>
> Raw Name Canonicalization:
> All Raw Section/Tag/Value Names are canonicalized thusly:
> Any text within a quoted string is unescaped in the manner of
> ANSI C (C90 spec).
> (e.g., "[\\Huh\x3F]" => [\Huh?])
> All whitespace within a quoted string is preserved.
> Whitespace between two quoted strings is eliminated. (provides string
> concatenation much like ANSI C)
> Whitespace at the beginning and end of the Raw Name is eliminated.
> All other whitespace (i.e., whitespace before or after an unquoted word)
> is condensed to single space (like 'collapse' whitespace handling in xml).
>
> Lines may generally occur in any order, but some situations are
> considered errors. Errors occur:
> If the first line is not a Section Line.
> If a DanglingSubsection Line is not immediately followed by a
> RescuedSubsection Line.
> If a Section Line or the end-of-stream appears within a Subsection
> (after fewer CloseSubsection lines than Subsection/RescuedSubsection Lines).
> If a CloseSubsection Line appears outside of a Subsection (after
> an equal number of CloseSubsection and Subsection/RescuedSubsection Lines).
>
> ---------------------------------
> Noteworthy Features of the Proposed Lexer/Parser
>
> => The asterisk '*' signifier for "final" lines is no longer
> supported. Asterisks are not considered special characters at
> all. If this is undesirable or surprising, please let me know.
>
> => The semicolon ';' only signifies a comment at the beginning of a
> line, whereas the pound sign '#' signifies a comment whenever it
> appears unquoted. Note the following lines:
> # a comment
> foo = bar # a comment
> foo = "bar # NOT a comment"
> ; a comment
> foo = bar ; NOT a comment
> Why this difference? Some existing krb5.conf files' relation values
> (notably the auth_to_local value) may have unquoted semicolons ';' in
> them. As far as we have seen, no existing krb5.conf files' relation
> values or tags use unquoted pound signs '#'. If this is untrue,
> please let me know.
>
> => Relation values may not start with an unquoted open curly brace
> '{'. For example, the line:
> foo = { bar
> is considered an error. Note that the existing parser would treat
> this as a relation assigning the value "{ bar" to the tag "foo". The
> existing parser's behavior is confusing enough that it is probably
> best discarded. If this is untrue, please let me know.
>
> => DanglingSubsections and RescuedSubsections: The existing parser
> allows the open curly brace '{' for subsections to appear on the line
> after the equal sign '=', like so:
> foo = # dangling subsection
> { # rescued subsection, yay!
> bar = baz
> }
> This syntax is pretty, because you can line up the open and close
> curlies. But it violates the one-line-per-element
> linebreaks-are-syntactic rule to which the parser otherwise strictly
> adheres. Personally, I don't like this because it is an extra layer
> of complexity, and the corresponding format for relation values is not valid:
> foo = # dangling relation?
> bar # can't be rescued, and doesn't parse. boo!
> Anyway, this syntax continues to be supported in the new proposal
> (actually improved because the existing parser doesn't allow comments
> to appear on lines between the equal sign and the open curly
> brace). If anyone thinks this should be eliminated or supported
> differently, please let me know.
>
> => All section names, tag names, and value names may contain any
> character (except for null '\0') including whitespace. Since all
> such names also support ANSI C quoted strings, there is a way to
> include any special character. For example,
> ["foo]"] # close bracket in a section name
> "} foo" = bar # close curly brace at start of a tag name.
> "foo " = bar # space at the end of a tag name
> foo bar = baz #single collapsed space in the middle of a tag name.
> "foo=" = bar # equal sign in a tag name.
> "#foo" = bar #pound sign in a tag name.
> foo = "{ bar" #open curly brace at start of a value name.
> foo = "\"bar\"" #quotation marks in a value name.
> foo = "\x3F" #raw byte value for a question mark. Iffy!
> Note that some of these would be errors in the existing parser, while
> others would be interpreted much differently. Also note that the
> allowing of "\xhh" and "\ooo" byte codes can get a bit iffy in the
> future. Right now, data is just stored as null-terminated byte
> strings, with no guaranteed interpretation of codes outside the 7-bit
> ASCII range. We are planning to eventually use UTF-8 as the internal
> representation of data. If your krb5.conf file is in UTF-8, and the
> byte codes specified either in raw form or via "\xhh" encoding are
> UTF-8, you will probably not see surprising behavior. Other
> encodings may be unhappy when using byte codes. If any of these are
> surprising or seem wrong, please let me know.
>
> => Error Recovery: In all error cases, reasonable recovery steps can
> be taken to continue parsing. For example, if the first line is not
> a Section Line, it can be treated as a comment line. As another
> example, if a Section Line does not contain an unquoted close square
> bracket ']', the parser can pretend that one exists. Thus
> [foo #whoops
> can be interpreted as
> [foo ] #whoops
> The default parser behavior is to note the error and perform error
> recovery. Thus a tree will always be produced, regardless of syntax
> errors. When the parser returns, it returns the tree as well as the
> list of errors. The calling function can decide whether the errors
> are fatal or ignorable. This allows the existing API to be
> implemented (most errors are fatal), as well as more flexible parse
> functions which allow certain classes of error. I can, upon request,
> talk about the specific recovery steps planned for each of the error
> cases. (i.e., please let me know).
>
> --------------------
>
> Whew, that's it! Mostly. I have not specified here how comments are
> attached into the tree, although the existing API ignores comments
> anyway. And I'm sure there are other issues I haven't touched on, so
> if you have questions... you know. Thanks for your time and patience
> if you've read all this! :-)
>
> --Joe
>
> _______________________________________________
> krbdev mailing list krbdev at mit.edu
> https://mailman.mit.edu/mailman/listinfo/krbdev
More information about the krbdev
mailing list