Performance issues with krb5-1.9.1
Jonathan Reams
jr3074 at columbia.edu
Tue Aug 9 10:13:02 EDT 2011
Chris,
We didn't actually see any problems either until the KDC was under heavy load. The unpatched version of 1.9.1 was and still is running on our secondary KDC without issue, and we had been using 1.9.1 in testing and development for months without issue as well. During the period where we saw the performance degradation, the primary KDC handled 467000 distinct AS/TGS requests. Which means the KDC was handling roughly 43 requests per second (not counting lots of retransmits). That is typical of our primary production KDC's workload throughout the day, but we don't have any other KDC that gets that amount of traffic; by contrast, our secondary KDC gets a request once or twice a minute. So it would seem the performance problem only really comes into play when the KDC is under heavy load.
Jonathan
On Aug 9, 2011, at 4:23 AM, Chris Hecker wrote:
>
> Just another data point: I'm not seeing this on my locally built (but
> not with the attached patch) 1.9.1:
>
> real 0m41.409s
> user 0m3.358s
> sys 0m3.683s
> finished round 1
>
> real 0m35.036s
> user 0m3.441s
> sys 0m3.658s
> finished round 2
>
> real 0m44.344s
> user 0m3.363s
> sys 0m3.728s
> finished round 3
>
> real 0m40.930s
> user 0m3.465s
> sys 0m3.973s
> finished round 4
>
> I had to reduce the number of inner iterations to 300 because my machine
> is slow. The variance in the above numbers is because there's a bunch
> of stuff running on this machine.
>
> Chris
>
> On 2011/08/08 11:21, Greg Hudson wrote:
>> On Mon, 2011-08-08 at 11:22 -0400, Jonathan Reams wrote:
>>> I did some performance testing on our test KDC and was able to
>>> reproduce the performance issue with 1.9.1.
>>
>> I found a regression which would affect these tests, but I'm not sure it
>> accounts for your global performance issues.
>>
>> The KDC in krb5 1.9 isn't supposed to be using an on-disk replay cache,
>> but due to a bug, it is actually opening and reading a replay cache for
>> every TGS request, which is significantly less efficient than the 1.8
>> behavior (using a replay cache which stays open for the lifetime of the
>> KDC).
>>
>> In a test which runs in under five minutes, this regression produces
>> visible O(n^2) performance characteristics. This would not necessarily
>> account for performance degradation over hours, as the performance drag
>> of the replay cache should become stable after five minutes. It's
>> possible that the constant drag was enough to cause the KDC to fall
>> behind on the request load, but it's also possible that there's a second
>> problem which isn't so easily reproduced.
>>
>> I've attached a patch. Note that there is a second, in-memory
>> "lookaside" cache with O(n^2) performance characteristics in the short
>> term, which holds queries for up to two minutes. You may see a slight
>> degradation in performance in test cases due to this. You can
>> temporarily rebuild the kdc directory with "make clean;
>> CPPFLAGS=-DNOCACHE" if you want to remove this variable from your
>> performance tests.
>>
>>
>>
>>
>> ________________________________________________
>> Kerberos mailing list Kerberos at mit.edu
>> https://mailman.mit.edu/mailman/listinfo/kerberos
> ________________________________________________
> Kerberos mailing list Kerberos at mit.edu
> https://mailman.mit.edu/mailman/listinfo/kerberos
>
More information about the Kerberos
mailing list