Coding practices proposals

Fri Mar 18 15:38:07 EDT 2011

On Fri, 2011-03-18 at 14:59 -0400, Nico Williams wrote:
> I much prefer having a single lexical scope for all owner pointer
> variables in functions, with one goto label for all cleanup.  I'd even
> prefer if all such variables were always initialized to NULL where
> declared and then have free() called unconditionally on each of those
> variables' values in the cleanup section.  Alias and non-pointer
> variables should be OK to declare in inner scopes.  Whenever an
> pointer ownership is transferred the old owner should be set to NULL.
> This style makes it much easier to get a high degree of confidence
> that there are no memory leaks and no double frees.

Full agreement, with the proviso that I don't see much point in nulling
out owner variables which immediately go out of scope or get destroyed.

I think this paragraph is mostly representative of the middle of the
road consensus of the thread so far.  Sam and Ken are more partial to
inner-scope variable declarations; Tom and I would probably prefer to
never use them.

> You might even want to have a naming convention (a common affix) for
> owner pointer variables.

Not sold on this at the current time; I see the benefit but for the
moment I think the verbosity cost outweighs it.

>   I'd rather there never were any pointer
> aliases as structure fields -- always dup objects instead or else use
> reference counting.

We use aliases a lot when we construct a temporary structure on the
sack, use it, and then throw it away.  For instance, when converting a
null-terminated string to a krb5_data to pass to some function which
needs it.  C strings aren't reference counted and copying the string
would be verbose and wasteful.  This pattern also arises in functions
like krb5_mk_req where we are generating the encoding of a structure and
then throwing away the structure.

I think it's reasonably safe to create structures full of aliases.  Once
you start mixing aliases and owner fields within the same structure, I
think it becomes hard to verify the correctness of a function, though.

> Also, all output arguments MUST be initialized on function entry, even
> if the documentation says that output values are undefined in error
> cases.

Yeah, this is already a documented practice, although not all of our
code conforms.

>   (Also, on when you know you're in the client-side I'd rather
> assert() than return EINVAL-like errors when functions are used
> incorrectly, but this is too difficult to do in a krb5 library, so, oh
> well.)

This practice is unfriendly to libraries which might have bindings in
safe languages, unless the bindings can trivially ensure the validity of
the arguments.

> I don't personally care for having a naming convention for
> distinguishing output arguments from others as long as the types used
> make it obvious which arguments are inputs and which are outputs.
> E.g., if the output arguments are all of the form "type **" (or
> "pointer_type *") and all input arguments have fewer '*'s in their
> types, then there's no need to add an "_out" suffice to the output
> arguments.  Avoid using arguments for input and output.

Noted.  Our historical practice of filling in caller-provided structure
containers (especially krb5_data) often makes it unclear what's an input
and what's an output.  If we stick to outputting pointers, it's a little
clearer until you start getting into sequences (like "krb5_padata **in,
krb5_padata ***out").