Decrypt integrity check failed after sending several correct messages

Thu Feb 14 20:53:44 EST 2008

On Feb 14, 2008, at 10:04, Jose Miguel Such wrote:
>> failed.  Aside from random hardware issues, I have no guess as to
>> what the problem would be here.  I'd probably start with adding lots
>> of debugging code to the libraries to log lots of info if the check
>> fails, and then log some of the same info when you retry again.
>> Sorry, I know it's not very helpful if you're not comfortable diving
>> into the Kerberos code....
> I haven't got into the kerberos code before, but if it is the only  
> way ...

I can't think of anything better.  I'd probably start with  
instrumenting every place KRB5KRB_AP_ERR_BAD_INTEGRITY can be  
generated (there aren't many) to print a message, and once the  
correct one is identified, when the error comes up, make it save away  
the data it's checksumming and the checksum it's verifying, and set  
some flag so that in the next pass (after you've waiting your 20ms  
and tried again), if the checksum verifies, compare the data and  
checksum against the previous versions.  If they all match the ones  
from the first attempt, our checksum verification may be subtly  
broken; if they don't match, there may be some subtle data race in  
your code or the OS.

And if we're going to get into this level of detail, perhaps we  
should take it off-list...

>> Hmm... you said this is a large number of processes.  I assume they
>> are all single-threaded?  Are you using shared memory or memory-
>> mapped files for interprocess communication?
> I use sockets for interprocess communication, because our processes  
> are
> distributed among different hosts, so it allows as to communicate  
> processes
> in both the same and different machines in the same way.

Okay, I was wondering about possible synchronization issues between  
processes there...

> Processes are not single-threaded but there is a mutex inside each  
> process
> avoiding that more than one thread access gss functions at a time.

That *should* be enough... the OS mutex code should ensure that all  
writes from the processor have become visible to other processors  
(even if whatever architecture you're using allows reordering, which  
some do, though some don't), and then there are also mutexes used  
within the krb5 and gssapi libraries to protect some data....  Are  
these actually multiprocessor systems, or just multiple threads on  
single processors?

Ken