KDC performance test - lookaside cache impact, testing framework

Thu Apr 5 17:03:56 EDT 2012

Petr Spacek <pspacek at redhat.com> writes:

> Greetings,
>
> my name is Petr Spacek and I work in cooperation with Red Hat on 
> Kerberos Performance Test Suite. (It's my master thesis project, some 
> details are at the end of e-mail, after the interesting part :-)
>
> To get familiar with KDC and Kerberized environment I did some simple 
> synthetic performance tests on Fedora's MIT KDC version 1.10.1.
> I think results can be interesting for you.

Thanks for doing these tests; the results are interesting and helpful.

> Test was pretty simple: 100 kinits in parallel requested 100 TGTs each 
> (for single principal), i.e. totally 10000 TGT requests. Time necessary 
> to fulfil all requests was measured.
> These bursts were sent and measured one after another repeatedly.

Did the KDC remain running between bursts?  If not, at what points was
it started/restarted?

> KDC was on one computer, kinits was on another computer. Detailed HW 
> configurations are not important for now, I think. Load of KDC's CPU was 
> ~ 100 %, client's load was < 30 %.
>
> Performance impact of some KDC configuration options was tested. Numbers 
> below are from configurations with disable_last_success = true and 
> disable_lockout = true. I can provide more details, if it will be necessary.
>
>
> The results are as follows (time measured for each successive burst):
>
> KDC DB in local file:
> Without pre-authentication: 9 s, 24 s, 37 s, 48 s, 40 s, 40 s
> With pre-authentication: 26 s, 63 s, 75 s, 68 s, 72 s
>
> KDC DB in OpenLDAP (same host as KDC):
> Without pre-authentication: 14 s, 36 s, 55 s, 55 s, 50 s
> With pre-authentication: 36 s, 86 s, 83 s
>
> I was very surprised with there results. I repeated measurements and 
> variation between attempts was < 10 %. It's not very precise, but I 
> think it's enough to confirm observed trends.

What was the interval between bursts?  Why does the number of bursts
vary in each experiment?

> After some time a found "KDC lookaside cache" with defined "STALE_TIME" 
> 120 s. I think it explains observed trends: Time required to fulfil one 
> burst got stable after approximately 120 s. The reason is (I think) that 
> rate of adding new requests to the cache and removing "stale" requests 
>>From the cache are approximately same. When size of the cache stabilizes 
> measured times also stabilizes. (It's only hypothesis.)
>
>
> Then I recompiled KDC with --disable-kdc-lookaside-cache switch and 
> repeated tests:
>
> KDC DB in local file:
> Without pre-authentication: 6 s, 6 s, 6 s, 6 s
> With pre-authentication: 8 s, 8 s, 8 s, 8 s, 8 s
>
> KDC DB in OpenLDAP (same host as KDC):
> Without pre-authentication: 13 s, 14 s, 14s
> With pre-authentication: 6 s, 7 s, 6 s, 7 s

This is interesting.  I would expect the "With pre-authentication"
case to have longer times than the "Without pre-authentication" case.

> These numbers are nicer :-D
>
> Please, let me ask some academic questions:
> Why the lookaside cache is there? I looked into RFC4120. Section 3.1.2 
> http://tools.ietf.org/html/rfc4120#section-3.1.2 say:
>
>     Because Kerberos can run over unreliable transports such as UDP, the
>     KDC MUST be prepared to retransmit responses in case they are lost.
>     If a KDC receives a request identical to one it has recently
>     processed successfully, the KDC MUST respond with a KRB_AS_REP
>     message rather than a replay error.  In order to reduce ciphertext
>     given to a potential attacker, KDCs MAY send the same response
>     generated when the request was first handled.  KDCs MUST obey this
>     replay behavior even if the actual transport in use is reliable.
>
> Ok, it's a standard and it has to be followed. Can be STALE_TIME lower? 
> It there a real problem, when lookaside cache is disabled? What are 
> implications?

I had forgotten that we relaxed the requirement on exact response
retransmission to a "MAY".  I think that we technically can disable the
lookaside cache and still conform to RFC 4120.

Various developers have suspected that the lookaside cache can be a
performance bottleneck under some circumstances.  Your tests would
seem to confirm that.  It would be useful to do experiments to
discover if there are ever any cases where the lookaside cache
actually helps performance.

It's debatable whether something like the lookaside cache will
significantly thwart cryptographic attacks that require large amounts
of ciphertext.  An attacker that can control all network
communications can also transmit a large number of slightly varying
AS-REQ messages to obtain multiple ciphertexts whose plaintexts only
differ slightly.