KDC performance test - lookaside cache impact, testing framework

Thu Apr 5 18:01:19 EDT 2012

On 04/05/2012 11:03 PM, Tom Yu wrote:
> Petr Spacek<pspacek at redhat.com>  writes:
>
>> Greetings,
>>
>> my name is Petr Spacek and I work in cooperation with Red Hat on
>> Kerberos Performance Test Suite. (It's my master thesis project, some
>> details are at the end of e-mail, after the interesting part :-)
>>
>> To get familiar with KDC and Kerberized environment I did some simple
>> synthetic performance tests on Fedora's MIT KDC version 1.10.1.
>> I think results can be interesting for you.
>
> Thanks for doing these tests; the results are interesting and helpful.
>
>> Test was pretty simple: 100 kinits in parallel requested 100 TGTs each
>> (for single principal), i.e. totally 10000 TGT requests. Time necessary
>> to fulfil all requests was measured.
>> These bursts were sent and measured one after another repeatedly.
>
> Did the KDC remain running between bursts?  If not, at what points was
> it started/restarted?
KDC wasn't restarted between bursts. It was restarted after each 
configuration/database change. KDC was also restarted between retests 
with same configuration. After each restart measured times starts from 
first value and grows again (as expected).

>> KDC was on one computer, kinits was on another computer. Detailed HW
>> configurations are not important for now, I think. Load of KDC's CPU was
>> ~ 100 %, client's load was<  30 %.
>>
>> Performance impact of some KDC configuration options was tested. Numbers
>> below are from configurations with disable_last_success = true and
>> disable_lockout = true. I can provide more details, if it will be necessary.
>>
>>
>> The results are as follows (time measured for each successive burst):
>>
>> KDC DB in local file:
>> Without pre-authentication: 9 s, 24 s, 37 s, 48 s, 40 s, 40 s
>> With pre-authentication: 26 s, 63 s, 75 s, 68 s, 72 s
>>
>> KDC DB in OpenLDAP (same host as KDC):
>> Without pre-authentication: 14 s, 36 s, 55 s, 55 s, 50 s
>> With pre-authentication: 36 s, 86 s, 83 s
>>
>> I was very surprised with there results. I repeated measurements and
>> variation between attempts was<  10 %. It's not very precise, but I
>> think it's enough to confirm observed trends.
>
> What was the interval between bursts?  Why does the number of bursts
> vary in each experiment?
Interval between bursts was < 2 s. It's time spent in scripts. It was 
necessary to fork/execv new 100 kinit instances, send signal to whole 
process group and do some measurement magic.
Processes after fork was stopped in sigsuspend(). They was started after 
receiving signal. (To start whole set of processes in parallel.)
It's far from perfect method, but variation between measurements was 
acceptable for me.

 > Why does the number of bursts vary in each experiment?
New bursts wasn't started when total test time exceeded ~ 120 s (after 
end of current burst).
These numbers are from repeated tests. At this point I already knew 
about STALE_TIME definition, testing environment was configured more 
appropriately (i.e. with firewalls disabled etc.) and so on.

Tests without lookaside cache were stopped after several tries, because 
there wasn't anything interesting. (Yes, it's really not good scientific 
approach...)

>> After some time a found "KDC lookaside cache" with defined "STALE_TIME"
>> 120 s. I think it explains observed trends: Time required to fulfil one
>> burst got stable after approximately 120 s. The reason is (I think) that
>> rate of adding new requests to the cache and removing "stale" requests
>> > From the cache are approximately same. When size of the cache stabilizes
>> measured times also stabilizes. (It's only hypothesis.)
>>
>>
>> Then I recompiled KDC with --disable-kdc-lookaside-cache switch and
>> repeated tests:
>>
>> KDC DB in local file:
>> Without pre-authentication: 6 s, 6 s, 6 s, 6 s
>> With pre-authentication: 8 s, 8 s, 8 s, 8 s, 8 s
>>
>> KDC DB in OpenLDAP (same host as KDC):
>> Without pre-authentication: 13 s, 14 s, 14s
>> With pre-authentication: 6 s, 7 s, 6 s, 7 s
>
> This is interesting.  I would expect the "With pre-authentication"
> case to have longer times than the "Without pre-authentication" case.

These times are flipped, as I replied to Chris Hecker in parallel e-mail 
thread. It should be:

Without pre-authentication: 6 s, 7 s, 6 s, 7 s
With pre-authentication: 13 s, 14 s, 14s

It's mistake when writing original e-mail, not a measurement problem.

>> These numbers are nicer :-D
>>
>> Please, let me ask some academic questions:
>> Why the lookaside cache is there? I looked into RFC4120. Section 3.1.2
>> http://tools.ietf.org/html/rfc4120#section-3.1.2 say:
>>
>>      Because Kerberos can run over unreliable transports such as UDP, the
>>      KDC MUST be prepared to retransmit responses in case they are lost.
>>      If a KDC receives a request identical to one it has recently
>>      processed successfully, the KDC MUST respond with a KRB_AS_REP
>>      message rather than a replay error.  In order to reduce ciphertext
>>      given to a potential attacker, KDCs MAY send the same response
>>      generated when the request was first handled.  KDCs MUST obey this
>>      replay behavior even if the actual transport in use is reliable.
>>
>> Ok, it's a standard and it has to be followed. Can be STALE_TIME lower?
>> It there a real problem, when lookaside cache is disabled? What are
>> implications?
>
> I had forgotten that we relaxed the requirement on exact response
> retransmission to a "MAY".  I think that we technically can disable the
> lookaside cache and still conform to RFC 4120.
>
> Various developers have suspected that the lookaside cache can be a
> performance bottleneck under some circumstances.  Your tests would
> seem to confirm that.  It would be useful to do experiments to
> discover if there are ever any cases where the lookaside cache
> actually helps performance.
>
> It's debatable whether something like the lookaside cache will
> significantly thwart cryptographic attacks that require large amounts
> of ciphertext.  An attacker that can control all network
> communications can also transmit a large number of slightly varying
> AS-REQ messages to obtain multiple ciphertexts whose plaintexts only
> differ slightly.

Interesting. As soon as testing framework will be functional, I will do 
some end-to-end tests. Probably there will be bottlenecks at various 
parts of authentication path (I expect problems in libraries, database 
...), so lookaside cache impact will not be so significant.

Another story is lookaside cache implementation: It's simple linear 
list. In my test case whole list has to be read and compared for each 
request.

Back to cryptography: "slightly varying" is expected to allow only very 
limited set of request? So all possible requests can be in lookaside 
cache? (I'm not cryptography educated, so sorry for trivial questions.)

Good night from Europe.

Petr Spacek