how to "ban" clients?

Tue Jul 26 09:14:46 EDT 2011

> I can understand the appeal of doing whatever you can because not all
> bad actors are perfect automatons with unlimited foresight, but it's
> not compelling to me in this case.

I do understand where you're coming from on the system complexity front, 
I really do.  However, the current behavior really does totally violate 
the principal of least surprise in a big way for new kerberos users.

I read a lot about how kerberos works and the philosophy and theory 
behind it before choosing it, and I was comfortable with the idea that 
already-issued tickets are trusted until they expire; that's a 
fundamental part of the model and why it scales.  That makes sense, and 
the engineering reasons behind that are sound, and I'm on board.

However, once somebody does have to go back to talk to the KDC, at that 
point it seems completely clear that -allow_tix should actually not 
allow any more tix.  I mean, really, come on.  :)

I spent a long time trying to figure out what I was doing wrong while I 
was testing this, since it was so clearly not doing the Right Thing.

It seems like there are three hesitations here:

a.  Whether it matters for security.  I would say it does matter, even 
though I understand your point about the bad guys and >0 weaknesses 
below.  However, I think it's pretty clear a little more security is 
better, as long as it doesn't cost too much complexity and performance, 
and it's well documented.  In fact, the very idea that a ticket has a 
lifetime is an argument for incremental security.  You can do damage, 
but not forever.  In this case, you can do damage, but not to any new 
principals.  They seem very similar in philosophy to me.

Also, as far as I can tell from reading the web, Windows AD shuts down 
all tickets for non-cross-realm principals when an account is 
disabled[1].  And finally, it's 100% obviously the more intuitive 
behavior to have the KDC stop issuing tickets, and I think making things 
intuitive where possible is a big win in lots of ways, including security.

b.  Complexity.  I'm about to make the change, and it doesn't look like 
it's going to be more than a few lines of code.  I'm also fixing the 
krb5_db_entry pass-by-value "bug" I mentioned in the private mail 
(because passing a 68 byte structure containing pointers by value gives 
me hives :), but the code for this client check and the aprof bool fetch 
are going to be short and easy to code-review.  Incidentally, I 
implemented and tested the one-line if-statement change to fix the 
-allow_svr u2u thing, and it works.  It fails to issue the ticket if you 
kvno princ, but it will let princ be the u2u server, so that's fixed 
locally and I'll send a patch (I will keep all three patches separate 
for your sanity).

c.  Performance.  I haven't tested this yet, but I am planning on 
writing a load tester for this stuff anyway, and so I'll be able to say 
quantitatively how much impact this has.  However, even if it's a 
significant performance hit, I would say the aprof bool is the right way 
to handle that, letting admins decide to trade off the performance for 
the additional security.  Admins already have some control over your #2 
below, they can trade scalability and convenience and have shorter lived 
tickets.

Obviously all I can do is request the feature, make the argument, and 
provide the patch.  Hopefully the patch will be simple and clear enough, 
and the performance impact low enough, that you'll be convinced!  If 
not, hey, that's the beauty of open-source, I can fix it for me, put the 
patch on my website, and it's just a bit more friction for admins if 
they want the feature.  But, hopefully it'll go into mainline, because 
patch management sucks.  :)

> (Trivia: we used to sort of address (3) by refusing to unwrap
> messages in the GSSAPI krb5 mech after a user's tickets expired.

Hah, in my next batch of spam to the list I was going to ask about 
exactly how that is handled, but I figured I should look at the source 
for mk_safe and mk_priv first to see if they check expirations.  It 
sounds like you're saying they don't.  I was thinking about this and it 
does seem complicated to handle robustly for clients if things just 
start failing and you need to reauth in the middle of communications. 
Some sample code might help people do this right.  I'll ask about the 
best way to handle that in a different thread next week maybe.

Thanks,
Chris

[1]  "When the administrator disables the user account in emea, he or 
she won’t be able to get any more tickets for resources in emea. The 
user will however still be able to get new tickets for resources in the 
asiapac domain—this will be possible as long as the user’s TGT for 
asiapac remains valid. The reason for this is that the DCs in the 
asiapac domain don’t check the user’s account status when they issue 
tickets." 
http://books.google.com/books?id=05xyiZqC8ToC&lpg=PA178&ots=Bx9VV3TWvk&pg=PA178#v=onepage&q&f=false

On 2011/07/26 04:59, ghudson at MIT.EDU wrote:
> Let's try again without the analogies.
>
> It's a reasonable desire to want to flip a switch and make a
> particular user instantly unable to affect your environment.  That's
> the kind of thing you should get from a centrally managed
> authentication system, right?  Unfortunately, there are three holes
> big enough to drive a truck through:
>
>    1. The user can keep requesting service tickets until the user's TGT
>       expires.
>
>    2. The user can keep using service tickets until they expire.
>
>    3. The user can keep using active sessions until the session is
>       invalidated somehow (interrupted connection, restarted client or
>       server).
>
> You're proposing to fix (1), which is the only one of the three which
> can be addressed on the KDC.  This comes at either a performance cost
> (looking up the client every time, if it's a local principal) or a
> complexity cost (adding a configuration variable, and making the TGS
> validation code paths conditional on whether or not we've looked up
> the client).  Certainly, addressing (1) would limit the scope of the
> things a bad actor could do after account closure, but not completely,
> and only if the bad actor didn't anticipate the account closure by
> requesting a bunch of service tickets.
>
>> From an implementor's perspective, it's sometimes better to have a
> simpler system with three weaknesses than a more complicated system
> with two.  An attacker doesn't care about the number of weaknesses as
> long as it's positive.  I can understand the appeal of doing whatever
> you can because not all bad actors are perfect automatons with
> unlimited foresight, but it's not compelling to me in this case.
>
> To actually get rapid-reaction account closure, you need to implement
> a temporary blacklist in the servers--and in your case, possibly also
> in the clients.  I know, that's a huge headache for an application
> developer to have to worry about.  But I don't think there's a
> comprehensive substitute at the Kerberos level.
>
> (Trivia: we used to sort of address (3) by refusing to unwrap messages
> in the GSSAPI krb5 mech after a user's tickets expired.  That at least
> puts a time bound on how long the user can affect services, for
> applications using GSSAPI wrap and unwrap.  We had to turn this off
> because it was too disruptive to existing applications, which were
> generally not written by security people.  Also, there are a lot of
> applications like ssh which don't use GSSAPI wrap and unwrap, so it
> wasn't really solving the problem.)
>