[RFC] Improvements to KCM prococol and notification mechanism

Thu Feb 4 05:25:59 EST 2021

On 2/3/21 10:21 PM, Robbie Harwood wrote:
> Greg Hudson <ghudson at mit.edu> writes:
> 
>> On 2/3/21 5:26 AM, Pavel Březina wrote:
>>> I would like to propose two (backward compatible) changes to the KCM
>>> protocol to improve performance of sssd-kcm. Please comment on the
>>> following design and let me know if this is something MIT Kerberos would
>>> accept. Also let me know if there is other channel where changes to KCM
>>> protocol should be discussed.
>>
>> We should coordinate changes with Heimdal (where the KCM protocol
>> originated), which in practice means making sure Nico is aware of this
>> thread.
> 
> CCing Nico to start with.  (I'll also forward the initial mail in a
> moment.)
> 
>>> Improve KCM performance when listing credentials
>>
>> This might be a superior approach to iteration.  For large caches, it
>> significantly reduces IPC overhead in exchange for more memory use
>> (and perhaps more total work if the iteration is cut short by the
>> caller).
>>
>> However, if the goal is to have good performance when connecting to
>> many hosts via ssh, this proposal is a half-measure.  We would like
>> this operation to take linear time rather than quadratic, while the
>> proposal merely reduces the constant factor (apparently by a lot).
> 
> Yes, it brings it line with what we get from KEYRING - which basically
> works like this already.  Since that's what we were using before, that's
> our benchmark for "good performance" :)
> 
>> Unfortunately, solving the quadratic behavior requires more
>> far-reaching design work.  There are two problems:

We don't need to get the best performance out of it. In the tests we run 
(acquiring 200 service tickets via gssapi) we got 2 seconds with 
keyring, 2 minutes with sssd-kcm (was 30 minutes before fixing sssd 
bottleneck). Seconds is fine, minutes is too much.

It would be great to get to linear performance of course, but given it 
is difficult perhaps it is an overkill if we find this good enough.

>> 1. GSSAPI iterates over the cache when acquiring credentials.  Partly
>> under the assumption that krb5_cc_retrieve_cred() is a slow operation,
>> the krb5 gss_acquire_cred() iterates over the ccache to detect
>> relevant config entries and to determine the expiration time.
>>
>> For the config entries we could make separate krb5_cc_get_config()
>> calls.  This would unfortunately increase the constant factor for FILE
>> ccaches.  (Nico has designs for fast retrieval from FILE ccaches
>> involving a giant config entry containing a hash table of credentials,
>> but I don't want to get into that territory.  I can also imagine
>> caching config values within a FILE ccache handle, using the file size
>> to resolve cache consistency edge cases.)
>>
>> We could decompose this into some krb5_cc_get_config() and
>> krb5_cc_retrieve_cred() calls, with the unfortunate effect of
>> increasing the constant factor with FILE ccaches.
>>
>> For the expiration time, in the common case we can make a
>> krb5_cc_retrieve_cred() call for the TGT (obeying start_realm if
>> present).  However, GSSAPI also works for a ccache containing only
>> service creds.  We don't know at acquire_cred time what the target
>> service is.  The current behavior is to return the expiration time of
>> the first non-config entry in the cache, which requires at least
>> partial iteration.
>>
>> 2. krb5_cc_retrieve_cred() is implemented via iteration, even in KCM.
>>
>> Solving this is mostly a coordination problem.  In Heimdal KCM, there is
>> a KCM_OP_RETRIEVE, but the client does not use it, and the kcmd
>> implementation will make a TGS request from the KCM daemon if the
>> requested credential is not found.  This behavior might be harmless
>> (it's what the Microsoft LSA does, after all) or it might have startling
>> effects.
>>
>> In the macOS fork of Heimdal, this situation is cleaned up: the client
>> does use KCM_OP_RETRIEVE, the daemon does not make a TGS request for
>> KCM_OP_RETRIEVE operations, and there is a separate KCM_OP_GET_TICKET
>> for making a TGS request from the KCM daemon (with a client-side entry
>> point _krb5_kcm_get_ticket(), not used within the tree).  It seems
>> facially reasonable to follow Apple's lead, although it could lead to
>> surprising behavior in combination with Heimdal's KCM daemon.
> 
> I'm interested in Nico's/Heimdal's thoughts on this.  Fixing (2) seems
> easier than fixing (1).
> 
>>> KCM/KRB5 Changes Notification Mechanism
>>
>> This proposal is very Linux-specific, and the reliance on
>> XDG_RUNTIME_DIR seems like it might create problems for daemons that
>> don't operate within a user login session.
>>
>> I am also not sure whether the intent is a notification mechanism for
>> a single ccache ("the function will be called on a ccache for which we
>> want to receive notifications") or all ccaches ("KCM will touch the
>> file each time a ccache is created, destroyed or changed").

Right, the intention is for single ccache. The suggestion to work on all 
ccaches is a left over from initial draft, we found that not feasible 
since it was too much KCM specific.

>>
>> It seems like this problem can be addressed without help from libkrb5
>> or the KCM protocol, with a contract between sssd and GNOME.
> 
> Maybe it would be more clear to think of this proposal in two parts.
> 
> First, the other (unixy....) ccache types currently all have to get
> notifications for changes (inotify, or the notifications interface for
> keyrings).  So we'd like for KCM to have one as well, even if only for
> GOA.  My understanding of the proposal is that the KCM protocol would
> gain a new operation, which would request the daemon create a
> notification file and respond with its location.  So sssd-kcm's
> implementation using XDG_RUNTIME_DIR is Linux-specific, but other KCMs
> wouldn't be bound by that.
> 
> Second, once all ccache types have this capability, krb5 could provide
> an interface to it.  As you say, this isn't strictly needed - GOA could
> carry this code, as it's already doing.
> 
> I don't know whether there's interest in providing this interface.  If
> there's not, then we'd want to redesign it - as you suggested, a
> protocol extension might not be the best way at that point.

The goal is to provide common interface that would yield a path on which 
consumers can inotify, obviously other mechanisms could be use on other 
platforms eg fswatch on Mac.

> 
> Thanks,
> --Robbie
>