ASN encoding

Fri Oct 31 21:59:05 EST 2003

the only valid characters which may be used in RFC1510 implementations 
of Kerberos within GeneralString fields are those contained in US-ASCII.

The following text is quoted from:

   draft-ietf-krb-wg-kerberos-clarifications-04.txt

5.2.1. KerberosString

    The original specification of the Kerberos protocol in RFC 1510 uses
    GeneralString in numerous places for human-readable string data.
    Historical implementations of Kerberos cannot utilize the full power
    of GeneralString.  This ASN.1 type requires the use of designation
    and invocation escape sequences as specified in ISO-2022/ECMA-35
    [ISO-2022/ECMA-35] to switch character sets, and the default
    character set that is designated as G0 is the ISO-646/ECMA-6
    [ISO-646,ECMA-6] International Reference Version (IRV) (aka U.S.
    ASCII), which mostly works.

    ISO-2022/ECMA-35 defines four character-set code elements (G0..G3)
    and two Control-function code elements (C0..C1). DER prohibits the
    designation of character sets as any but the G0 and C0 sets.
    Unfortunately, this seems to have the side effect of prohibiting the
    use of ISO-8859 (ISO Latin) [ISO-8859] character-sets or any other
    character-sets that utilize a 96-character set, since it is
    prohibited by ISO-2022/ECMA-35 to designate them as the G0 code
    element. This side effect is being investigated in the ASN.1
    standards community.

    In practice, many implementations treat GeneralStrings as if they
    were 8-bit strings of whichever character set the implementation
    defaults to, without regard for correct usage of character-set
    designation escape sequences. The default character set is often
    determined by the current user's operating system dependent locale.
    At least one major implementation places unescaped UTF-8 encoded
    Unicode characters in the GeneralString. This failure to adhere to
    the GeneralString specifications results in interoperability issues
    when conflicting character encodings are utilized by the Kerberos
    clients, services, and KDC.

    This unfortunate situation is the result of improper documentation of
    the restrictions of the ASN.1 GeneralString type in prior Kerberos
    specifications.

    The new (post-RFC 1510) type KerberosString, defined below, is a
    GeneralString that is constrained to only contain characters in
    IA5String

       KerberosString  ::= GeneralString (IA5String)

    US-ASCII control characters should in general not be used in
    KerberosString, except for cases such as newlines in lengthy error
    messages. Control characters SHOULD NOT be used in principal names or
    realm names.

    For compatibility, implementations MAY choose to accept GeneralString
    values that contain characters other than those permitted by
    IA5String, but they should be aware that character set designation
    codes will likely be absent, and that the encoding should probably be
    treated as locale-specific in almost every way. Implementations MAY
    also choose to emit GeneralString values that are beyond those
    permitted by IA5String, but should be aware that doing so is
    extraordinarily risky from an interoperability perspective.

    Some existing implementations use GeneralString to encode unescaped
    locale-specific characters. This is a violation of the ASN.1
    standard. Most of these implementations encode US-ASCII in the left-
    hand half, so as long the implementation transmits only US-ASCII, the
    ASN.1 standard is not violated in this regard. As soon as such an
    implementation encodes unescaped locale-specific characters with the
    high bit set, it violates the ASN.1 standard.

    Other implementations have been known to use GeneralString to contain
    a UTF-8 encoding. This also violates the ASN.1 standard, since UTF-8
    is a different encoding, not a 94 or 96 character "G" set as defined
    by ISO 2022.  It is believed that these implementations do not even
    use the ISO 2022 escape sequence to change the character encoding.
    Even if implementations were to announce the change of encoding by
    using that escape sequence, the ASN.1 standard prohibits the use of
    any escape sequences other than those used to designate/invoke "G" or
    "C" sets allowed by GeneralString.

    Future revisions to this protocol will almost certainly allow for a
    more interoperable representation of principal names, probably
    including UTF8String.

    Note that applying a new constraint to a previously unconstrained
    type constitutes creation of a new ASN.1 type. In this particular
    case, the change does not result in a changed encoding under DER.

Gustavo Rios wrote:
> Dear gentleman/madam,
> 
> i am studing kerberosV (RFC1510) protocol specification. Some data
> types for communication are specified as GeneralString encoding. Then
> i started studying ASN. It came to surprise my that, not only the
> sources of documentation advice against the usage of GeneralString as
> also the own ITU standard. Since, I respectfully request your
> clarification towards this.
> 
> How does current Kerberos implementations deal with this? I mean, how
> the current encoding performing... What are the valid characters used
> by the current implementation on the market. (And i am considering MIT
> and HEIMDAL implementation at least, if you known some one else, let
> me know).
> 
> Thanks a lot for your time.
> 
> PS: My source of information are:
> 
> 	http://asn1.elibel.tm.fr/en/book/index.htm
> 	http://asn1.elibel.tm.fr/en/standards/index.htm