mechglue registration of gss_buffer_t pointers

Sun Nov 4 18:53:19 EST 2007

On Fri, Nov 02, 2007 at 08:13:22PM -0400, Tom Yu wrote:
> nico> I suspect that cache and lock contention could be significant problems.
> nico> Don't forget that I'm still interested in a GSS pseudo-mech that
> nico> negotiates channel binding and which has non-crypto per-msg tokens.  For
> nico> such a mechanism the cost of non-existent crypto will not dominate the
> nico> cost of buffer registration.
> 
> How fast do you expect the per-msg GSS calls to be in such a case?

The MIC tokens in such a case would be a constant octet string, so the
cost of allocating and copying that string would be the cost of making
those MIC tokens; verifying them would cost less still.  Wrap tokens
would be a small header and a copy of the cleartext octet string...

> nico> Using various atomic operations you might (should) be able to have a
> nico> mostly-lockless implementation.  That might not (likely would not) be as
> nico> portable, so you might have multiple implementations of buffer
> nico> registration, or perhaps vendors will substitute their own as needed.
> 
> Are you sure that the added complexity and portability difficulties of
> a lock-free implementation are worth the potential performance gains?

We can't know for sure until we have data.

> Certainly we could modularize the buffer registration interface to
> allow vendor-specific optimizations.

Yes, that's certainly a possibility.  I could live with that.

> To put things in perspective, we already use a suboptimal registration
> strategy today.  Our krb5 mechanism uses a singly-linked list
> registration facility for many allocated GSS-API objects (context
> handles, cred handles, etc.) and keeps a single lock held while
> traversing the list for lookups and deletions.

I know.

>                                                 Do you find that this
> bottleneck creates problems for your use cases in Solaris NFS?

I don't know, but I do know that the big lock that we have + gssd being
single threaded are problems, and we definitely need to work on
improving those.  Part of the problem is that few customers are using
secure NFS in environments where performance matters, but we also expect
this to change down the road.  The point being: let's not make the
problems we know we have or will have any worse.

> nico> That seems like a recipe for lock contention.
> 
> Is that such a problem if the whole-table lock isn't held across calls
> to the specific mechanism's release_buffer()?

MT-hot allocators (allocators being comparable to buffer registration)
don't work this way, right?

> nico> A per-thread table would avoid this, assuming that the same threads that
> nico> create a token will release it, with a penalty for releasing buffers in
> nico> threads other than the ones where they were allocated.
> 
> I think a per-thread table isn't suitable because if there are
> pseudo-mechanisms that can return a buffer previously allocated by
> another specific mechanism, the registration code needs to avoid
> making duplicate entries.

In the case of pseudo-mechs that pass another mech's tokens through
arguably the right thing to do is to create dup entries, then not matter
what funny thread games are being played there will be a matching number
of gss_release_buffer() calls for any given buffer.  (I'm assuming that
pseudo-mechs re-enter the mechglue when calling other mechs.)

> I think we should not preclude buffers being released in a different
> thread than the one from which they were allocated.

Certainly not.  But if the common case has allocs and releases in the
same thread then you can optimize for that.

> nico> I think the call to the mech-specific gss_release_buffer() should be the
> nico> last step, that the buffer should already be de-registered and the table
> nico> unlocked.
> 
> I guess this would be safe if we know that there won't be another
> thread attempting to register a pointer between the time that the
> pointer is unregistered by the mechglue release_buffer() and when the
> specific mechanism release_buffer() gets called.  Of course, that
> would probably only happen if there were an application programming
> error, so we may not want to worry about that.

It's safe period (in fact, you'd have to call the mech-specific
gss_release_buffer() with no locks held because of pseudo-mech
reentrance issues); apps should not be releasing the same buffer twice.

> I think we will find that there is a more general requirement for
> allowing a specific mechanism to use its own allocator.  I understand
> that every Win32 DLL has its own memory allocation arena and that
> mixing these up can cause crashes.

But does Win32 have anything like object groups?

> You might want to compile the kernel GSS code so that mechglue does no
> buffer registration at all in that case and just calls out to the
> generic deallocator.  I wasn't planning to implement in a way that
> would preclude that possibility.

Definitely, as long as we can have build-time support for vendor-
specific differences we'll be OK; preferably vendor-specific code could
be in the MIT krb5 tree.

Nico
--