ASN.1 encoding in MIT Kerberos

Paul Moore paul.moore at
Tue Oct 21 19:25:18 EDT 2008

have u looked at the ber encoder in the openldap project, it is 100%
separable from the ldap code base and is a joy to use

it uses the printf and scanf idioms to read and write ber/der packets

-----Original Message-----
From: krbdev-bounces at MIT.EDU [mailto:krbdev-bounces at MIT.EDU] On Behalf
Of Ken Raeburn
Sent: Tuesday, October 21, 2008 4:10 PM
To: krbdev at Dev List
Subject: ASN.1 encoding in MIT Kerberos

Hi.  I've been working a bit lately on reworking the ASN.1 encoders in  
the MIT Kerberos code.  I'm hoping to have something ready to put into  
the source tree soon, probably in the next couple of weeks, so I  
thought I'd send out a note describing what I'm doing and see if  
anyone has comments.

First: Yes, we know our ASN.1 encoders are ugly and hard to deal with,  
and in the long term, we should replace them.  But using ASN.1  
compilers generally involves using new data structures generated by  
the tools, and since many of our current data structures are exposed  
in our API, that means either translating at run time, including  
additional allocations, to convert between different representations,  
or banging on the generated code from the compiler (or the compiler  
itself) until it can work with our existing data structures.  So in  
the *short* term, we're trying to come up with something that's both a  
bit more comprehensible, and a bit more structured so as to reduce the  
opportunities for errors in writing support for new message types.  If  
it winds up being more compact than our current encoders, so much the  

The short-term project is what I've been working on -- a mostly table- 
driven encoder targeted primarily at the Kerberos protocol.  It only  
encodes DER, doesn't have many of the primitive ASN.1 types that we  
don't use, is optimized for sequences where field elements are usually  
tagged, doesn't handle arbitrarily large tag values, etc.  It's not  
intended to be a fully-functional general-purpose ASN.1 encoder at  
this stage.

I'm also not tackling the decoders at this time.  Doing the encoders  
gets us tables describing the ASN.1 and C data structures to a large  
degree, and perhaps this can be a starting point for future work on  
the decoders, but reworking the encoders is a more manageable initial  


Since we don't have a unique mapping in either direction between ASN.1  
types and C types, each object we describe is a combination of a C  
type and an ASN.1 encoding for it; krb5_data encoded as OCTET STRING  
is different from krb5_data encoded as GeneralString.  Each of these  
gets a "descriptor" name associated with it when we define it.  The  
current revision still works entirely in C code, using helper macros,  
but I hope they'll be at least somewhat clearer than the existing pile  
of macros.

Two sets of macros are defined.  The first set is for defining a C/ASN. 
1 type descriptor.  You supply a descriptor name and other parameters,  
such as the C type, or info on sequence fields, or a primitive  
encoding function.  For example:

DEFFNTYPE(gstring_data, krb5_data, asn1_encode_generalstring_data_at);

Here "gstring_data" becomes a descriptor name for a GeneralString  
encoded from a krb5_data structure using the indicated function, which  
takes as one of its arguments a pointer to the krb5_data.  If your  
data structure includes a krb5_data that you want encoded as a  
GeneralString, you use "gstring_data" to describe it; if it contains a  
krb5_data*, you would use "gstring_data_ptr", which is defined as  
encoding the same thing as "gstring_data" but starting with an  
additional level of indirection.  There are also macros for defining  
descriptors for SEQUENCE types, SEQUENCE OF defined as a null- 
terminated array of pointers to a base type, an encoding of another  
type with an APPLICATION tag added, etc.

The various DEF*TYPE macros always define a variable type_<name> that  
encodes the information about the type -- size, how to encode it,  
etc.  They also define typedefnames associated with the descriptor  
name so that, for example, when describing a structure field using a  
descriptor name, we can inject some compile-time code to verify that  
the structure field has the C type indicated by the descriptor.   
(Currently that works by creating a "?:" expression using pointers to  
the expected type and the actual field, so if there's a mismatch, you  
get a warning or error, but it may refer to a conditional operator you  
never typed in, so unfortunately it can be a little obscure.)

The second set of macros defines fields of a sequence/structure.   
"Normal" fields are indicated by the C field name and a descriptor  
describing the field type plus its encoding; there are additional  
macros for dealing with strings (GeneralString, OCTET STRING) that are  
encoded in the structure as two fields for pointer and length,  
SEQUENCE OF types encoded as pointer and (krb5_int32) length, constant  
integer values that get encoded but aren't represented in the  
structure, and a couple other oddball cases.

There are variants of most of them for handling optional fields; a  
helper function (one per sequence type that has optional fields) is  
called and returns an "unsigned int" bit mask, and each field  
descriptor holds either a bit position to check for optional fields,  
or -1 for required fields.

Since most of the Kerberos types have context tags on each sequence  
element, to optimize for this in the first cut, each field descriptor  
has a tag value as well, with -1 meaning no tag is to be added.  (No  
separate macros for untagged fields currently.)  In both of these  
cases, it probably would've been more compact to encode N+1 and use an  
unsigned type.  That can be fixed up later if needed.

So, we get, for example:

            pvno            [0] INTEGER (5),
            msg-type        [1] INTEGER (21),
                            -- NOTE: there is no [2] tag
            enc-part        [3] EncryptedData -- EncKrbPrivPart
    } */
static const struct field_info priv_fields[] = {
     FIELDOF_NORM(krb5_priv, encrypted_data, enc_part, 3),
DEFSEQTYPE(untagged_priv, krb5_priv, priv_fields, 0);
DEFAPPTAGGEDTYPE(krb5_priv, 21, untagged_priv);

The use of the C type name in the FIELDOF_NORM macro has to be  
repeated for each field entry (except for "immediate" integer values)  
in the current incarnation, unfortunately, though I could redefine a  
macro named TYPE before each sequence description and make the macro  
look for that name.  While these structures are hand-maintained for  
now, I think it's better to keep them around and obvious.

The FIELDOF_NORM expansion uses the hidden typedefname created for the  
"encrypted_data" descriptor to type-check the "enc_part" field of  
"krb5_priv" and ensure that it's of type "krb5_enc_data".  As the  
functions implementing the encoding engine pass around void pointers  
all the time, this is the only place to fit the type checking in.

Then, to get an actual encoder function, another macro:

MAKE_FULL_ENCODER(encode_krb5_priv, krb5_priv);

... which uses the implicit typedefs again to create a function that  
takes the desired pointer types (not void*) and generates a krb5_data  
holding the encoding, or an error code.  The function declarations  
still have to be managed manually, but the goal was that the existing  
ones should be applicable, and aside from fixing a couple minor cases  
that were kind of bogus to begin with, like not using "const"  
consistently, or awkward, like encoding two inputs into one output.

It was intended that in the cases where actual functions are needed,  
they are mostly very small wrapper functions.  For example, any  
encoder function produced by MAKE_FULL_ENCODER converts the input  
pointer to void* and calls a common support routine with the  
appropriate type descriptor info; this routine allocates a temporary  
buffer and invokes the (recursive) encode-via-descriptor process, and  
copies the result into an output buffer.  So functions like  
encode_krb5_priv will be small.

The descriptor structures are kind of large at the moment, mostly  
because depending on the type of thing being described, we want  
various function or object pointers of different types and they're  
currently different fields; with some casting or designated union  
initialization, I can make them smaller.  Even so, I've already got a  
fair size reduction (27K -> 7.5K code + 10K tables, without the PKINIT  
or LDAP code), though unfortunately the tables require load-time  
relocations now.

Tom and I discussed some possible extensions to this, some of which I  
don't think I'm going to have time for right now, like using a script  
to read a file with similar data and generate C code, possibly  
multiple separate bits of C code for each sequence or type, which  
could make (for example) the type-checking more obvious in the  
generated code, while still optimizing it away to no run-time code.   
It could also get rid of the multiple uses of the same C type name  
while still having an input format that makes it clear what type is  
being used where.

Having some mechanism to export and import these type descriptors,  
better than just exporting the type_<name> variables, is also very  
desirable.  Tom and I talked about perhaps using a registration  
function and name strings associated with each descriptor you want to  
export, so for example "octetstring/krb5_data" might be a name you  
might attach to one, and perhaps a routine would be provided to look  
up a type descriptor by name, or process a table of them, for use by  
an external module like PKINIT or the LDAP KDB back end, both of which  
currently have their ASN.1 code in the main Kerberos library.

Anyways, those may come later.  I think I've covered the general idea,  
and should have code ready for review in not too long...

krbdev mailing list             krbdev at

More information about the krbdev mailing list