rcache fsync() avoidance
nico at cryptonector.com
Tue Sep 2 12:30:24 EDT 2014
On Tue, Sep 02, 2014 at 11:56:20AM -0400, Greg Hudson wrote:
> I think at this point we are prepared to just get rid of the fsync()
> calls in the MIT krb5 rcache implementation, and call that an
> implementation limitation. My reasoning is:
> * For most situations where replay caches help, they provide limited
> protection against active attacks anyway. (Basically: if the protocol
> needs replay protection because it uses Kerberos for authentication
> only, an active attacker could modify the data stream or suppress the
> legitimate authentication to bypass the replay cache. Replay caches
> only provide complete protection when the data stream is protected by
> the Kerberos authentication context, but without an acceptor subkey,
> such that an attacker could replay a complete session to cause an action
> to be executed twice.)
I would just... declare such protocols dangerous and not supported.
No more rsh/rlogin/telnet with authentication only. Preferably no more
rsh/rlogin/telnet full stop.
Application protocols that could benefit from rcaches:
- UDP loggers (non-mutual auth AP-REQ + KRB-SAFE / MIC)
- UDP / SCTP apps generally
We could even deprecate non-mutual auth for Kerberos and use an rcache
only for PROT_READY tokens, or even document that PROT_READY token
replays are not detected until the first non-PROT_READY per-msg token is
processed by the server. (PROT_READY token semantics are close enough
to that anyways.)
Then we can get rid of the rcache completely.
It's probably a bit too soon to go that far. But we could discuss that
on the KITTEN WG list and see what happens.
> * The design you outline degrades into bad performance if either (1) the
> server has negative clock drift beyond the boot time estimate, or (2) a
> non-trivial fraction of clients have positive clock drift beyond the
> boot time estimate. It can also cause spurious authentication failures
> shortly after boot, for clients with negative clock drift.
If you're not using NTP or alike then it's fair to expect problems!
In any case, we really need a multi-round-trip extension anyways, which
should be the longer term answer to this concern.
> * The probability of bad performance behavior increases as the boot time
> estimate approaches zero. At some point in the future we might start to
> see VMs with sub-second reboot times, at which point even a 1s positive
> client clock drift would force an fsync() and even a 1s negative client
> clock drift could cause a spurious authentication failure shortly after
> a reboot.
This is quite true. If the estimate is 3s but the real time to boot is
.5s you have a window of vulnerability, but that's hardly worse than
just never doing fsync()! :)
Yes, a window of vulnerability a few seconds long would be enormous to
the right attacker, and really, these attacks never happen. That's a
good reason to stop fsync()ing altogether. But fsync()s might be more
relevant to other protocols, so at least documenting (done; this thread
can be it) fsync() avoidance might help someone else.
More information about the krbdev