[RFC] Improvements to KCM prococol and notification mechanism
ghudson at mit.edu
Wed Feb 3 12:09:37 EST 2021
On 2/3/21 5:26 AM, Pavel Březina wrote:
> I would like to propose two (backward compatible) changes to the KCM
> protocol to improve performance of sssd-kcm. Please comment on the
> following design and let me know if this is something MIT Kerberos would
> accept. Also let me know if there is other channel where changes to KCM
> protocol should be discussed.
We should coordinate changes with Heimdal (where the KCM protocol
originated), which in practice means making sure Nico is aware of this
> Improve KCM performance when listing credentials
This might be a superior approach to iteration. For large caches, it
significantly reduces IPC overhead in exchange for more memory use (and
perhaps more total work if the iteration is cut short by the caller).
However, if the goal is to have good performance when connecting to many
hosts via ssh, this proposal is a half-measure. We would like this
operation to take linear time rather than quadratic, while the proposal
merely reduces the constant factor (apparently by a lot).
Unfortunately, solving the quadratic behavior requires more far-reaching
design work. There are two problems:
1. GSSAPI iterates over the cache when acquiring credentials. Partly
under the assumption that krb5_cc_retrieve_cred() is a slow operation,
the krb5 gss_acquire_cred() iterates over the ccache to detect relevant
config entries and to determine the expiration time.
For the config entries we could make separate krb5_cc_get_config()
calls. This would unfortunately increase the constant factor for FILE
ccaches. (Nico has designs for fast retrieval from FILE ccaches
involving a giant config entry containing a hash table of credentials,
but I don't want to get into that territory. I can also imagine caching
config values within a FILE ccache handle, using the file size to
resolve cache consistency edge cases.)
We could decompose this into some krb5_cc_get_config() and
krb5_cc_retrieve_cred() calls, with the unfortunate effect of increasing
the constant factor with FILE ccaches.
For the expiration time, in the common case we can make a
krb5_cc_retrieve_cred() call for the TGT (obeying start_realm if
present). However, GSSAPI also works for a ccache containing only
service creds. We don't know at acquire_cred time what the target
service is. The current behavior is to return the expiration time of
the first non-config entry in the cache, which requires at least partial
2. krb5_cc_retrieve_cred() is implemented via iteration, even in KCM.
Solving this is mostly a coordination problem. In Heimdal KCM, there is
a KCM_OP_RETRIEVE, but the client does not use it, and the kcmd
implementation will make a TGS request from the KCM daemon if the
requested credential is not found. This behavior might be harmless
(it's what the Microsoft LSA does, after all) or it might have startling
In the macOS fork of Heimdal, this situation is cleaned up: the client
does use KCM_OP_RETRIEVE, the daemon does not make a TGS request for
KCM_OP_RETRIEVE operations, and there is a separate KCM_OP_GET_TICKET
for making a TGS request from the KCM daemon (with a client-side entry
point _krb5_kcm_get_ticket(), not used within the tree). It seems
facially reasonable to follow Apple's lead, although it could lead to
surprising behavior in combination with Heimdal's KCM daemon.
I'll note that there are some operator-level workarounds for the
quadratic cache behavior, though pf course they should not be necessary.
Caching service tickets is causing more harm than good in this
scenario, so you want to suppress it. We don't have a mechanism for
directly suppressing service ticket caching, so the operator has to do
it by manipulating the ccache. For FILE ccaches, the operator can make
a copy of the cache file before each ssh operation and point ssh at it.
kvno --out-cache (from krb5 1.19) or Heimdal's kgetcred can be used to
make a copy of other ccache types.
> KCM/KRB5 Changes Notification Mechanism
This proposal is very Linux-specific, and the reliance on
XDG_RUNTIME_DIR seems like it might create problems for daemons that
don't operate within a user login session.
I am also not sure whether the intent is a notification mechanism for a
single ccache ("the function will be called on a ccache for which we
want to receive notifications") or all ccaches ("KCM will touch the file
each time a ccache is created, destroyed or changed").
It seems like this problem can be addressed without help from libkrb5 or
the KCM protocol, with a contract between sssd and GNOME.
More information about the krbdev