[RFC] Improvements to KCM prococol and notification mechanism

Mon Feb 15 09:13:48 EST 2021

On 2/4/21 11:25 AM, Pavel Březina wrote:
> On 2/3/21 10:21 PM, Robbie Harwood wrote:
>> Greg Hudson <ghudson at mit.edu> writes:
>>
>>> On 2/3/21 5:26 AM, Pavel Březina wrote:
>>>> I would like to propose two (backward compatible) changes to the KCM
>>>> protocol to improve performance of sssd-kcm. Please comment on the
>>>> following design and let me know if this is something MIT Kerberos would
>>>> accept. Also let me know if there is other channel where changes to KCM
>>>> protocol should be discussed.
>>>
>>> We should coordinate changes with Heimdal (where the KCM protocol
>>> originated), which in practice means making sure Nico is aware of this
>>> thread.
>>
>> CCing Nico to start with.  (I'll also forward the initial mail in a
>> moment.)
>>
>>>> Improve KCM performance when listing credentials
>>>
>>> This might be a superior approach to iteration.  For large caches, it
>>> significantly reduces IPC overhead in exchange for more memory use
>>> (and perhaps more total work if the iteration is cut short by the
>>> caller).
>>>
>>> However, if the goal is to have good performance when connecting to
>>> many hosts via ssh, this proposal is a half-measure.  We would like
>>> this operation to take linear time rather than quadratic, while the
>>> proposal merely reduces the constant factor (apparently by a lot).
>>
>> Yes, it brings it line with what we get from KEYRING - which basically
>> works like this already.  Since that's what we were using before, that's
>> our benchmark for "good performance" :)
>>
>>> Unfortunately, solving the quadratic behavior requires more
>>> far-reaching design work.  There are two problems:
> 
> We don't need to get the best performance out of it. In the tests we run
> (acquiring 200 service tickets via gssapi) we got 2 seconds with
> keyring, 2 minutes with sssd-kcm (was 30 minutes before fixing sssd
> bottleneck). Seconds is fine, minutes is too much.
> 
> It would be great to get to linear performance of course, but given it
> is difficult perhaps it is an overkill if we find this good enough.

Proof of Concept:
https://github.com/pbrezina/sssd/tree/kcm
https://github.com/pbrezina/krb5/tree/kcm

I got to 12 seconds. This is still slow in comparison with keyring, but 
I think it is definitely good progress that should be delivered to users.

> 
>>> 1. GSSAPI iterates over the cache when acquiring credentials.  Partly
>>> under the assumption that krb5_cc_retrieve_cred() is a slow operation,
>>> the krb5 gss_acquire_cred() iterates over the ccache to detect
>>> relevant config entries and to determine the expiration time.
>>>
>>> For the config entries we could make separate krb5_cc_get_config()
>>> calls.  This would unfortunately increase the constant factor for FILE
>>> ccaches.  (Nico has designs for fast retrieval from FILE ccaches
>>> involving a giant config entry containing a hash table of credentials,
>>> but I don't want to get into that territory.  I can also imagine
>>> caching config values within a FILE ccache handle, using the file size
>>> to resolve cache consistency edge cases.)
>>>
>>> We could decompose this into some krb5_cc_get_config() and
>>> krb5_cc_retrieve_cred() calls, with the unfortunate effect of
>>> increasing the constant factor with FILE ccaches.
>>>
>>> For the expiration time, in the common case we can make a
>>> krb5_cc_retrieve_cred() call for the TGT (obeying start_realm if
>>> present).  However, GSSAPI also works for a ccache containing only
>>> service creds.  We don't know at acquire_cred time what the target
>>> service is.  The current behavior is to return the expiration time of
>>> the first non-config entry in the cache, which requires at least
>>> partial iteration.
>>>
>>> 2. krb5_cc_retrieve_cred() is implemented via iteration, even in KCM.
>>>
>>> Solving this is mostly a coordination problem.  In Heimdal KCM, there is
>>> a KCM_OP_RETRIEVE, but the client does not use it, and the kcmd
>>> implementation will make a TGS request from the KCM daemon if the
>>> requested credential is not found.  This behavior might be harmless
>>> (it's what the Microsoft LSA does, after all) or it might have startling
>>> effects.
>>>
>>> In the macOS fork of Heimdal, this situation is cleaned up: the client
>>> does use KCM_OP_RETRIEVE, the daemon does not make a TGS request for
>>> KCM_OP_RETRIEVE operations, and there is a separate KCM_OP_GET_TICKET
>>> for making a TGS request from the KCM daemon (with a client-side entry
>>> point _krb5_kcm_get_ticket(), not used within the tree).  It seems
>>> facially reasonable to follow Apple's lead, although it could lead to
>>> surprising behavior in combination with Heimdal's KCM daemon.
>>
>> I'm interested in Nico's/Heimdal's thoughts on this.  Fixing (2) seems
>> easier than fixing (1).
>>
>>>> KCM/KRB5 Changes Notification Mechanism
>>>
>>> This proposal is very Linux-specific, and the reliance on
>>> XDG_RUNTIME_DIR seems like it might create problems for daemons that
>>> don't operate within a user login session.
>>>
>>> I am also not sure whether the intent is a notification mechanism for
>>> a single ccache ("the function will be called on a ccache for which we
>>> want to receive notifications") or all ccaches ("KCM will touch the
>>> file each time a ccache is created, destroyed or changed").
> 
> Right, the intention is for single ccache. The suggestion to work on all
> ccaches is a left over from initial draft, we found that not feasible
> since it was too much KCM specific.
> 
>>>
>>> It seems like this problem can be addressed without help from libkrb5
>>> or the KCM protocol, with a contract between sssd and GNOME.
>>
>> Maybe it would be more clear to think of this proposal in two parts.
>>
>> First, the other (unixy....) ccache types currently all have to get
>> notifications for changes (inotify, or the notifications interface for
>> keyrings).  So we'd like for KCM to have one as well, even if only for
>> GOA.  My understanding of the proposal is that the KCM protocol would
>> gain a new operation, which would request the daemon create a
>> notification file and respond with its location.  So sssd-kcm's
>> implementation using XDG_RUNTIME_DIR is Linux-specific, but other KCMs
>> wouldn't be bound by that.
>>
>> Second, once all ccache types have this capability, krb5 could provide
>> an interface to it.  As you say, this isn't strictly needed - GOA could
>> carry this code, as it's already doing.
>>
>> I don't know whether there's interest in providing this interface.  If
>> there's not, then we'd want to redesign it - as you suggested, a
>> protocol extension might not be the best way at that point.
> 
> The goal is to provide common interface that would yield a path on which
> consumers can inotify, obviously other mechanisms could be use on other
> platforms eg fswatch on Mac.
> 
>>
>> Thanks,
>> --Robbie
>>
> 
> _______________________________________________
> krbdev mailing list             krbdev at mit.edu
> https://mailman.mit.edu/mailman/listinfo/krbdev
>