Performance and Replay Cache

Thu Dec 7 20:14:02 EST 2006

On Dec 7, 2006, at 10:21, Steve Edgar wrote:
> If the replay cache is not used, K5 latency is low; about that of K4
> latency.  A search on the krbdev mail list archives mentions some
> applications do not need the replay cache, but I think ours needs the
> replay cache.  Can someone comment on under which circumstances the
> replay cache need not be used?

Basically, the session should be cryptographically bound to the  
specific responses sent by the server in the authentication  
exchange.  We usually suggest doing this by having the server  
generate a subsession key in its reply, and protect the rest of the  
session using that key.  Anything not protected using that key is  
subject to replay, but as long as that's just the initial  
authentication phase, and no critical serve-side actions are tied to  
merely having authenticated, it should be okay, and you could set the  
replay cache type to "none:".

But if another service is using the same service key, authenticators  
from one service could be replayed to another service.  So, for  
example, if server1 receives credentials from a client connection and  
goes ahead with its exchange, and then server2 receives the same  
credentials from a client, and server2's protocol isn't protected  
using a server-generated subsession key, server2 really ought to be  
able to recognize the replay of the credentials.  Which means server1  
needs to have recorded the credentials -- whether or not the protocol  
used by server1 has built-in protection against replay attacks.

Actually, server2 would be somewhat vulnerable anyways, in this case,  
in that a man-in-the-middle attacker could prevent the credentials  
from reaching server1, or could delay them until the attack or  
server2 had been started.  However, that's more intrusive than an  
eavesdropper initiating an attack from a new IP address, and may be  
noticed by the user or administrator.

And then there's attacks on service principals shared by multiple  
machines (e.g., server clusters), which should really share a single  
replay cache, if indeed one is needed...

> Also from reading previous posts, I noticed some folks have mentioned
> the possibility of using a memory based replay cache as opposed to a
> disk based replay cache.  The risk with that being no replay cache is
> available immediately after a server restart.  Has anyone used a
> memory based replay cache, and if so did you see a big increase in
> performance?

We don't have support for a memory-based replay cache as such,  
although if I recall correctly, our current code will do most of its  
work on an in-memory copy once it's been read from the file.

One quick hack would be to mount a memory-based file system, and  
point the replay cache code at a file stored there.  Depending on the  
implementation, that may stop fsync on the file from actually causing  
any delay.  You could also modify the code not to do the fsync, as  
Rainer suggested.  In both cases, though, the occasional GC passes  
will rewrite the entire file contents.

> Is a significant increase in performance for a disk based replay
> cache possible via an alternative implementation?  That is, could
> changes or optimizations to that code improve its speed?  If so, we
> may have someone here at Cornell who could investigate and contribute.

Almost certainly.  There have been a number of ideas kicked around,  
but not really investigated, including:

* Not retaining the entire file contents in memory until we know it's  
going to help.  E.g., if you only do one rcache lookup per process,  
in the one-process-per-client model, there's no real benefit to  
retaining all of the data in memory at once.  If you do a second  
rcache lookup, well, then you're apparently not operating under that  
model, so maybe caching will help.

* Structuring the file data so that we can pick portions to read or  
write or GC, instead of the whole thing.  But while the probe needs  
to check based on some credential data, expiration of outdated  
entries would be timestamp-based, and insertion order, while roughly  
in timestamp order in many environments, won't be strictly in  
timestamp order.

* Using a directory of files, with separate files for ranges of N  
seconds.  So you can scan only the entries in the same time range  
(filename computed by arithmetic on the timestamp), and GC of old  
entries can be done with readdir() and unlink().  (I think Nico  
Williams suggested this.)  You might or might not want an auxiliary  
file with other data ("replay cache format 2", service principal name  
-- or perhaps something derived from the service key would be better  
-- hash function used, etc).

* Store a hash value over the relevant fields instead of verbatim  
copies.  While there's a chance of a false positive match, if the  
hash function is good enough (e.g., sha256 with enough bits  
retained), it's an acceptably tiny chance, and by throwing away the  
variable-length fields, parsing or seeking in the file becomes much,  
much more efficient.  You still need timestamps (or timestamp ranges)  
for expiration though.

* Structure in-memory data so that expiration of old entries is  
efficient.  Even then, as Rainer suggests, processing only a limited  
number of entries at a time keeps the response time more reasonable,  
while still guaranteeing steady progress against even a large pool of  
expired entries.

* Use IPC to a server process that manages the in-memory and on-disk  
data, similar to CCAPI on Mac and Windows.  As the only process to  
manage the on-disk file, it wouldn't have to worry about  
synchronizing with other processes.  (If only one long-running  
application server process is touching the file, you still want stat  
calls, file locking, and the ability to re-read the file data in the  
library code to be safe, unless we add ways to explicitly override  
that.  Such a server process, with exclusive access, wouldn't have  
that problem.)

* Use the already-in-tree DB2 (actually, "much-hacked Berkeley DB  
1.85" would be closer); I'm not sure how closely its capabilities  
align with our needs in this case though.

Like I said, lots of ideas kicked around, not a lot of hard  
investigation behind most of them yet.  And that's not even getting  
into questions of performance in a multithreaded application server.

If you've got someone who can investigate and implement some of these  
ideas, or maybe come up with better ones, yes, we'd like to hear from  
them.

Ken

P.S.  There are other issues with our current approach to storing  
replay-detection data.  For example, we may store the data in a file  
with a filename based on the uid of the process and the first  
component of the service, but really, some of the info needs to be  
accessible to all processes using the service key (even if the same  
key is used by multiple service principals), while not making it in  
any way easier for processes without the key to guess what the key  
might be.  And an exchange using a service's 3DES key cannot be a  
replay of an exchange using the service's AES key, so from a security  
perspective, there's no need for those two sets to be combined.  In  
the MIT code, we don't share keys between principals, and a service's  
processes are assumed to run under one uid, so it's not a big  
problem, but it's far from ideal.  For a minor reworking like file  
format changes, these can be ignored, but someday if and when we look  
at a major design change, perhaps they should be reconsidered.