RFC: preauth benchmarking methodology

Fri Jun 3 16:54:44 EDT 2011

On Fri, 2011-06-03 at 16:17 -0400, Marcus Watts wrote:
> Sounds to me that initially you should expect the number of
> worker processes to be = to the number of kdc machines.  So,
> w/ 1 machine, any degree of parallelism will give what
> you define as "non-useless" numbers.
> 
> The deviation of the # of successes -- interesting but is that the only
> deviation you want?  I suspect the distribution of times for successes
> will be interesting, not just the average or even the deviation.
> 
> Are you using a local db, or ldap?  If you have more than one
> kdc, are they serving requests randomly or is there a preferred
> order, and where is the master in that order?
> 
> Of course, in any "real" environment, kinit's aren't sequentially
> issued; they're issued "at will" independently by a large number
> of independent machines.  So in terms of your test, that means
> "parallelism" is not an integer and actually a function of demand,
> execution time, % of success, and perhaps some measure of
> resource contention.  In a larger environment, or even more an
> environment that updates success / failure, update contention
> will also be important.  Incremental replication may further
> complicate matters.
> 
> When designing your parallel architecture, you should probably
> also consider the possibility of an intentional DOS attack.
> 
> Mostly out of curiosity; are you using vm's for this, or real machines?

1. We don't care about the ability to scale up with multiple kdc's.  The
goal is to measure the impact of allowing a single kdc process to handle
multiple requests at one time.  I will probably run only a single kdc
process due to this fact.

2. We don't (at least right now) care about ldap.  The concern is the
preauth plugins.  For this reason, I am using local db.  I do however
assume that at some point in the future we will want to make ldap
connectivity async as well.

3. We don't care about simulating a real environment, but to test the
maximum request through put for a single process.  As such, parallelism
is a good measure of stress.

4. We don't care (at least right now) about DOS attacks.  This is
something that should be addressed in the future.  However, our work
(moving to an async core) should make the kdc more resistant to DOS.
That being said, I'm of the opinion that the best way to prevent against
DOS is a well placed firewall rule.  Trying to prevent this at the
protocol level requires much more processing and thus is more
susceptible to failure even with countermeasures in place.

In my mind the key to successful benchmarking is to identify what you
want to measure and measure only that thing. Otherwise you end up in
data overload land and your metrics are useless.  The metrics I retrieve
should demonstrate that there is a nearly 1:1 ratio of request time to
simulated delay.  It will also further demonstrate at what level of load
we begin to get high failure rates.  Lastly, we should be able to see
(via the standard deviation of successes) the impact of request load on
preauth verification performance.

Nathaniel