KDC performance test - lookaside cache impact, testing framework
Petr Spacek
pspacek at redhat.com
Thu May 17 07:59:33 EDT 2012
On 04/06/2012 12:01 AM, Petr Spacek wrote:
> On 04/05/2012 11:03 PM, Tom Yu wrote:
>> Petr Spacek<pspacek at redhat.com> writes:
>>
>>> Greetings,
>>>
>>> my name is Petr Spacek and I work in cooperation with Red Hat on
>>> Kerberos Performance Test Suite. (It's my master thesis project, some
>>> details are at the end of e-mail, after the interesting part :-)
>>>
>>> To get familiar with KDC and Kerberized environment I did some simple
>>> synthetic performance tests on Fedora's MIT KDC version 1.10.1.
>>> I think results can be interesting for you.
>>
>> Thanks for doing these tests; the results are interesting and helpful.
>>
>>> Test was pretty simple: 100 kinits in parallel requested 100 TGTs each
>>> (for single principal), i.e. totally 10000 TGT requests. Time necessary
>>> to fulfil all requests was measured.
>>> These bursts were sent and measured one after another repeatedly.
>>
>> Did the KDC remain running between bursts? If not, at what points was
>> it started/restarted?
> KDC wasn't restarted between bursts. It was restarted after each
> configuration/database change. KDC was also restarted between retests
> with same configuration. After each restart measured times starts from
> first value and grows again (as expected).
>
>>> KDC was on one computer, kinits was on another computer. Detailed HW
>>> configurations are not important for now, I think. Load of KDC's CPU was
>>> ~ 100 %, client's load was< 30 %.
>>>
>>> Performance impact of some KDC configuration options was tested. Numbers
>>> below are from configurations with disable_last_success = true and
>>> disable_lockout = true. I can provide more details, if it will be necessary.
>>>
>>>
>>> The results are as follows (time measured for each successive burst):
>>>
>>> KDC DB in local file:
>>> Without pre-authentication: 9 s, 24 s, 37 s, 48 s, 40 s, 40 s
>>> With pre-authentication: 26 s, 63 s, 75 s, 68 s, 72 s
>>>
>>> KDC DB in OpenLDAP (same host as KDC):
>>> Without pre-authentication: 14 s, 36 s, 55 s, 55 s, 50 s
>>> With pre-authentication: 36 s, 86 s, 83 s
>>>
>>> I was very surprised with there results. I repeated measurements and
>>> variation between attempts was< 10 %. It's not very precise, but I
>>> think it's enough to confirm observed trends.
>>
>> What was the interval between bursts? Why does the number of bursts
>> vary in each experiment?
> Interval between bursts was< 2 s. It's time spent in scripts. It was
> necessary to fork/execv new 100 kinit instances, send signal to whole
> process group and do some measurement magic.
> Processes after fork was stopped in sigsuspend(). They was started after
> receiving signal. (To start whole set of processes in parallel.)
> It's far from perfect method, but variation between measurements was
> acceptable for me.
>
> > Why does the number of bursts vary in each experiment?
> New bursts wasn't started when total test time exceeded ~ 120 s (after
> end of current burst).
> These numbers are from repeated tests. At this point I already knew
> about STALE_TIME definition, testing environment was configured more
> appropriately (i.e. with firewalls disabled etc.) and so on.
>
> Tests without lookaside cache were stopped after several tries, because
> there wasn't anything interesting. (Yes, it's really not good scientific
> approach...)
>
>>> After some time a found "KDC lookaside cache" with defined "STALE_TIME"
>>> 120 s. I think it explains observed trends: Time required to fulfil one
>>> burst got stable after approximately 120 s. The reason is (I think) that
>>> rate of adding new requests to the cache and removing "stale" requests
>>>> From the cache are approximately same. When size of the cache stabilizes
>>> measured times also stabilizes. (It's only hypothesis.)
>>>
>>>
>>> Then I recompiled KDC with --disable-kdc-lookaside-cache switch and
>>> repeated tests:
>>>
>>> KDC DB in local file:
>>> Without pre-authentication: 6 s, 6 s, 6 s, 6 s
>>> With pre-authentication: 8 s, 8 s, 8 s, 8 s, 8 s
>>>
>>> KDC DB in OpenLDAP (same host as KDC):
>>> Without pre-authentication: 13 s, 14 s, 14s
>>> With pre-authentication: 6 s, 7 s, 6 s, 7 s
>>
>> This is interesting. I would expect the "With pre-authentication"
>> case to have longer times than the "Without pre-authentication" case.
>
> These times are flipped, as I replied to Chris Hecker in parallel e-mail
> thread. It should be:
>
> Without pre-authentication: 6 s, 7 s, 6 s, 7 s
> With pre-authentication: 13 s, 14 s, 14s
>
> It's mistake when writing original e-mail, not a measurement problem.
>
>>> These numbers are nicer :-D
>>>
>>> Please, let me ask some academic questions:
>>> Why the lookaside cache is there? I looked into RFC4120. Section 3.1.2
>>> http://tools.ietf.org/html/rfc4120#section-3.1.2 say:
>>>
>>> Because Kerberos can run over unreliable transports such as UDP, the
>>> KDC MUST be prepared to retransmit responses in case they are lost.
>>> If a KDC receives a request identical to one it has recently
>>> processed successfully, the KDC MUST respond with a KRB_AS_REP
>>> message rather than a replay error. In order to reduce ciphertext
>>> given to a potential attacker, KDCs MAY send the same response
>>> generated when the request was first handled. KDCs MUST obey this
>>> replay behavior even if the actual transport in use is reliable.
>>>
>>> Ok, it's a standard and it has to be followed. Can be STALE_TIME lower?
>>> It there a real problem, when lookaside cache is disabled? What are
>>> implications?
>>
>> I had forgotten that we relaxed the requirement on exact response
>> retransmission to a "MAY". I think that we technically can disable the
>> lookaside cache and still conform to RFC 4120.
>>
>> Various developers have suspected that the lookaside cache can be a
>> performance bottleneck under some circumstances. Your tests would
>> seem to confirm that. It would be useful to do experiments to
>> discover if there are ever any cases where the lookaside cache
>> actually helps performance.
>>
>> It's debatable whether something like the lookaside cache will
>> significantly thwart cryptographic attacks that require large amounts
>> of ciphertext. An attacker that can control all network
>> communications can also transmit a large number of slightly varying
>> AS-REQ messages to obtain multiple ciphertexts whose plaintexts only
>> differ slightly.
>
> Interesting. As soon as testing framework will be functional, I will do
> some end-to-end tests. Probably there will be bottlenecks at various
> parts of authentication path (I expect problems in libraries, database
> ...), so lookaside cache impact will not be so significant.
>
> Another story is lookaside cache implementation: It's simple linear
> list. In my test case whole list has to be read and compared for each
> request.
>
> Back to cryptography: "slightly varying" is expected to allow only very
> limited set of request? So all possible requests can be in lookaside
> cache? (I'm not cryptography educated, so sorry for trivial questions.)
>
> Good night from Europe.
Hello,
I did next performance tests on current KDC and I saw huge performance drop
with disable_last_success = false.
It's expected to get 100 times slower operation with disable_last_success =
false? All requests was for single principal - it's a DB locking problem? (=
Test problem? Should it matter if requests are for principals p1 ... p1000?)
Scenario is still pretty simple and synthetic:
One host talks to one KDC. Host sends AS_REQ as fast as it can. There was 3
processes calling krb5-libs in parallel. All request was for same principal.
KDC was restarted between tests. Measured value is number of obtained TGTs per
one second.
Five configurations were measured (+ means "enabled", - "disabled"):
-lokaside_cache -preauth_required -last_succ (~ 2850 TGS/sec)
-lokaside_cache +preauth_required -last_succ (~ 1550 TGS/sec)
+lokaside_cache -preauth_required -last_succ (~ 255 TGS/sec settled)
+lokaside_cache +preauth_required -last_succ (~ 150 TGS/sec settled)
+lokaside_cache +preauth_required +last_succ (~ 25 TGS/sec)
Graphs with measured values are downloadable from:
http://people.redhat.com/~pspacek/a/kdc_perf/file_all.pdf
http://people.redhat.com/~pspacek/a/kdc_perf/file_wo_success_writes.pdf
(Slowest test cut off. Good only for highlighting KDC lookaside impact.)
Important point: Y axis (number of TGTs obtained per second) is logarithmic.
It nicely shows huge impact of disable_last_success = false and KDC lookaside
cache (STALE_TIME left on default: 120 seconds).
Another question: Are there any public patches with proposals for better
lookaside cache? I'm curious ... :-)
Petr Spacek
More information about the krbdev
mailing list