Regarding Issues with Memory Credential Cache

Tue Aug 19 21:49:51 EDT 2008

Datar, Ashutosh Anil wrote:
> Hi,
>
> I was testing Apache Web Server (which uses mod_auth_kerb) with Kerberos Client 1.6.2 and found some issue with the Memory Cache handling.
>
> According to my understanding, memory cache is arranged as a linked list, each node of which represents a cache and provides a pointer to another linked list (list of credentials), each node representing a credential from that particular cache.
> (In source code, these nodes are named as Krb5_mcc_list_node and Krb5_mcc_cursor respectively)
>
> Now, as an already filed ticket (#5895: http://krbdev.mit.edu/rt/Ticket/Display.html?user=guest&pass=guest&id=5895) indicates, we need to place locks in krb5_mcc_initialize () and krb5_mcc_destroy () so that krb5_mcc_free doesn't get called unprotected, causing an application to receive SIGSEGV.
>
>   
I will look at this.  I have been playing with the ccache code recently...

> But, as mentioned in the ticket itself, this alone will not ensure the safe access, as a thread can still free a Krb5_mcc_list_node when another is still accessing it. And thus it requires some kind of reference count mechanism which will ensure freeing up Krb5_mcc_list_node happens only when refcount is zero (No one else accessing it). This is already implemented in File Cache handling.
>
>   
Should be easy to implement.
> While testing, I could still observe application receiving SIGSEGV at few other places and after analyzing it, following are my observations:
>
>
>  1.  Implemented reference count mechanism only ensures the safe access to Krb5_mcc_list_nodes, but when two threads are using same cache (e.g. two threads of same application) then they are actually simultaneously accessing Krb5_mcc_cursors, traversal of which is not completely protected by locks.
>  2.  Though, krb5_mcc_remove_cred ()  is not supported, a thread can still call  krb5_mcc_initialize ()  while another thread is still using those credentials in krb5_mcc_next_cred (), which doesn't operate under any kind of lock.
>  3.  krb5_mcc_get_principal () also tries to access a Krb5_mcc_list_node data without acquiring any lock on it.
>
>   
I think point (2) is an interesting issue... First of all - I think 
initializing a cache from underneath a different thread is not really 
socialable - but consider the file cache... In this case, the 
fcc_initialize function will unlink the old cache and open a new one.  
Now - a previously open cache in say another thread - will be screwed - 
unless the the OPENCLOSE flag is not set - in which case the O/S will 
still have access to the open file - but this is not supported.  Even 
worse, if the first thread initializes the cache and writes a bunch of 
data to it before the other thread iterates to the next entry - the 
position in the file may not longer be valid.

At best - you are facing an undefined state.

So - to be consistent, I would say that if you initialize a cache while 
another thread is iterating through it - the other thread should be 
screwed - but
should fail in a reliable way - without crashing the application.

An examination of the Heimdal code shows that initialize will probably 
screw you over as.  I think you might still have access to the old 
entries - as they are not cleared.  They do implement a remove function.

> As for point number 2 above, following are possible approaches that I see as solution:
>
>  1.  When a thread enters krb5_mcc_start_seq_get () to start the traversal of Krb5_mcc_cursor's list, it acquires a lock, which is relinquished in krb5_mcc_end_seq_get () so as to ensure that krb5_mcc_intialize () doesn't free the list when it's being traversed. But, this approach requires locking/unlocking across different functions and I am not very sure if it's the right way or not.
>  2.  To make sure that krb5_mcc_initialize () is called only once for a particular Krb5_mcc_list_node. But there can be situations where an application explicitly wants to refresh(flush) the Cache and here we are restricting it to do so.
>
> Can someone let me know which one is the better option, or if something else can be implemented which will remove both of the mentioned problems?
>
>   
My solution for you would be to implement the remove function. An 
alternative would be to copy the cache from one entry to another - 
skipping the entry you want to delete - and then tell all threads to 
start using the new cache - and then destroy the old one.

> Also, about the 3rd point in the observations' list, it's required that there is a lock acquired in krb5_mcc_get_principal () before accessing a node so as to avoid accessing an already freed location.
>
>   
I will take a look at this...
> Thanks and Regards,
> Ashutosh
>
>
>   

There is another screw case to consider for which the file cache does 
not deal either... If you destroy a cache in one thread while accessing 
in another thread - bad things could happen.  Heimdal handles the 
destroy case in an interesting way (for a memory cache) - it will flag a 
cache in use as dead - and removed from the global list. When the last 
thread holding it open closes the cache, the refcount drops to zero and 
is then deleted.
Therefore - two threads can use a shared cache - one thread can destroy 
it - and the other thread is ok...

Anyways - I will look into the locking problems you mentioned - and see 
about implementing the remove function - which I believe will solve your
problems....

Ezra