Overall shape of Python-based test framework

Tue Mar 2 03:26:17 EST 2010

On Feb 22, 2010, at 13:53, ghudson at MIT.EDU wrote:
> With 1.8 winding up, I'm trying to get cracking on a Python-based test
> framework.  The sooner we have it, the sooner we can start using it
> for 1.9 work.

To augment or replace dejagnu?

> We've had some internal discussions about how this should look.  What
> I would prefer is to create a library which can be used by individual
> Python test programs scattered around the tree near the functionality
> they test, much like the C unit tests are.

Some of the C tests are spread out that way.  Others, like the main dejagnu tests, may exercise a lot of different things in concert.  Where would "test cross-realm authentication" live in the tree?

>  The general workflow would
> be:
> 
>  1. Developer adds new functionality or fixes bug in code which can
>     only be tested in a running Kerberos environment.

Like, "get a keytab file from AD and authenticate a client using it"?

>  2. Developer creates C test program to exercise code (assuming
>     running environment), or identifies existing commands which can
>     exercise it (kinit, etc.).

So, Python code loading libkrb5 via an FFI is out?  I suppose, if you want to insulate the framework from library bugs, that makes some sense...

>  3. Developer creates Python script which uses the library to set up
>     the krb5 environment, executes the C test program or existing
>     commands, and tears down the environment.

With *lots* of high-level helper functions, I hope.  Like, "set up realm FOO with an LDAP database", and "exchange cross-realm keys between realms FOO and BAR".

I wonder about the performance, if you're constantly setting up and tearing down Kerberos databases and services. We're doing some of that now with dejagnu, so I guess it isn't completely intolerable.

>  4. Developer adds a check-unix rule to execute the Python test
>     script.

Please don't confine the testing to UNIX unless the tests don't make sense for Windows.

>  5. At some point in the future, the test fails.  Developer runs the
>     test program with a special flag or environment variable to
>     facilitate running the test commands under a debugger.  (Haven't
>     worked out the exact process.)

Sounds good.  When working on dejagnu tests, I generally dropped in a line that printed out what it was about to do, and spawned a new shell or xterm window, so I could get control back and do things myself.  It might be a fair first cut, if you haven't worked out a better way.

I'm not sure "run it under a debugger" is necessarily the right level to jump to though.  It may be more helpful for the developer to be able to alter the command-line options, or run another program before the main test program, tweak environment variables, etc.  You can do some of it with "gdb --args", because it'll remember the program and arguments you supply but not force you to use exactly those arguments or launch the program right away, but maybe the developer wants a shell prompt, and maybe another debugger isn't so flexible.

> I like two things about this model: first, I think step 3 is
> inherently easier than inserting a test into a "box of tests" like the
> dejagnu test suite.  Second, I think step 5 is inherently easier than
> convincing a "box of tests" to execute a particular command from a
> particular test under a debugger--because by the very act of running
> an individual test script narrows the work to that test.

You could do both... a test-driver script that executes a test indicated on the command line, or if there is none, all test scripts (based on filename pattern?) in the current directory, or current subtree, or something.

> 
> The cost is that the tests are not collected into one place, meaning:
> 
> * If any test starts failing, the whole test suite fails, and it
>  becomes a little more difficult to execute other tests (although
>  being able to run "make check" in a subdir helps).

So does "make -k check", I expect.

> 
> * Because of the previous point, if you're doing work on a branch
>  which deliberately breaks a whole raft of tests, you can't as easily
>  choose which order to work on fixing the tests.
> 
> * You can't produce reports and charts for a QA manager.

Unless you decouple the tests from the source tree so that you can add a new test for a bug you just fixed, and run the (updated) test suite against an old build tree or installation for comparison, I don't think there's much to put into reports, much less charts.  Though it would be nice to be able to type in "make -k check" (or something) and get out "these 5 tests failed, and these other 6 were untested as a result...."

> * You can't as easily set up expensive resources and reuse them for a
>  series of tests.
> 
> I don't consider these to be significant issues for us because (1)
> we're pretty good at keeping the test suite working, (2) we haven't
> done much in the way of "break the world" development in my
> experience, (3) we aren't big enough to have a QA manager, and (4) we
> don't have any expensive resources to set up (setting up a Kerberos
> environment is very fast as long as the automation is properly
> designed without using sleeps).

I'm a bit concerned that interoperability testing, both backwards-compatibility and mixed-implementation testing, isn't part of the plan; I see it as a serious weakness in our current testing.  I know it's not trivial to set up a second Kerberos environment with a different implementation or version, and in the case of AD involves some interesting licensing challenges, but pieces of it could be automated conditional on a few site settings (here's the address and admin account info for a W2k3 AD; Heimdal 1.0 is installed under this directory; krb5-1.6 is in directory foo on machine bar; here's a tarball with krb5-1.7.1 binaries but pathnames will have to be adjusted via config files and environment variables; here's how you spin up a virtual machine with a Vista client), and then tested much more frequently, if not also more carefully, than happens now.  But, that does lead to potentially expensive resources to be set up and managed.

It would be convenient if the tests were conditionalized so that those of us without Windows and Heimdal installations and whatnot get "untested" warnings rather than failures, but I think it's more important that MIT run such tests.

Random other thoughts:

It should be easy to plug in coverage testing, valgrind, purify, debugging mallocs, etc.  Ideally, for some of them, by just setting a flag or variable.  ("Use valgrind."  "Use this debugging malloc over here.")  I suspect this is where you were going with your message on #krbdev earlier.  Some of those require altering command lines, or changing environment variables for subprocesses, or both.

It would be nice to be able to run tests in parallel when they don't conflict -- whatever that means.  Ideally, without firing up 8 KDCs just because I ran 8 tests in parallel and they all wanted an account "testuser" with password "testuser" and et cetera et cetera; it'd be more of an issue with the more expensive resources I discussed above.  But having most of my CPU cores, and me, sit idle while I wait for an md4 test to finish before the md5 test can start (or if you prefer, kinit-with-renewable-lifetime test vs kinit-with-address test), etc., is a waste.

Ken

-- 
Ken Raeburn / raeburn at mit.edu / no longer at MIT Kerberos Consortium