Performance issues with krb5-1.9.1
Jonathan Reams
jr3074 at columbia.edu
Mon Aug 8 15:49:26 EDT 2011
Hi Greg,
I applied this patch and saw a great improvement on our test KDC. I should follow up and say that the performance degradation appeared to be compounded by clients resubmitting their requests after they timed out, so the KDC wasn't only handling new requests, it was trying to fulfill old requests that were being re-submitted. Below is the results of my little performance test with the patch applied to krb5-1.9.1.
[root at naiad ~]# for i in `seq 1 4`; do time for j in `seq 1 1000`; do kinit -k; kvno iprop_monitor | grep -v kvno; done; echo finished round $i; sleep 5; done
real 0m20.312s
user 0m4.513s
sys 0m13.690s
finished round 1
real 0m20.842s
user 0m4.512s
sys 0m13.896s
finished round 2
real 0m21.123s
user 0m4.605s
sys 0m13.787s
finished round 3
real 0m21.425s
user 0m4.508s
sys 0m13.642s
finished round 4
This is much better than the 56 seconds we saw without the patch. We'll roll this out to our secondary KDC and see how it goes this week. Thanks for resolving this so quickly.
Jonathan Reams
CUIT Systems Engineering
Columbia University
On Aug 8, 2011, at 2:21 PM, Greg Hudson wrote:
>
> I found a regression which would affect these tests, but I'm not sure it
> accounts for your global performance issues.
>
> The KDC in krb5 1.9 isn't supposed to be using an on-disk replay cache,
> but due to a bug, it is actually opening and reading a replay cache for
> every TGS request, which is significantly less efficient than the 1.8
> behavior (using a replay cache which stays open for the lifetime of the
> KDC).
>
> In a test which runs in under five minutes, this regression produces
> visible O(n^2) performance characteristics. This would not necessarily
> account for performance degradation over hours, as the performance drag
> of the replay cache should become stable after five minutes. It's
> possible that the constant drag was enough to cause the KDC to fall
> behind on the request load, but it's also possible that there's a second
> problem which isn't so easily reproduced.
>
> I've attached a patch. Note that there is a second, in-memory
> "lookaside" cache with O(n^2) performance characteristics in the short
> term, which holds queries for up to two minutes. You may see a slight
> degradation in performance in test cases due to this. You can
> temporarily rebuild the kdc directory with "make clean;
> CPPFLAGS=-DNOCACHE" if you want to remove this variable from your
> performance tests.
More information about the Kerberos
mailing list