Proposal for using NAPTR/URI records

Thu Feb 26 13:07:17 EST 2015

On Thu, 2015-02-26 at 12:41 -0500, Greg Hudson wrote:
> On 02/26/2015 11:53 AM, Petr Spacek wrote:
> > > 1. Additional latency for a protocol which nobody is (yet) using.
> > Oh no! This reminds me DNS-milisecond-wars we have with web 
> > browsers (DANE/TLSA) which then spend seconds executing Flash or 
> > Javascript ...
> 
> I'll me speak directly so Nathaniel doesn't have to interpret my 
> statements.
> 
> I would expect trouble if 1.14 came out and, by default, every 
> Kerberos realm discovery started doing a failed URI lookup and a 
> successful pair of SRV lookups, when previously only the pair of 
> successful SRV lookups was required.  That trouble might take the 
> form of slightly increased latency multiplied over many realm 
> discoveries, or greatly increased latency because the URI lookup 
> times out.

My main concern for not finding this compelling is that if the realm 
has SRV records it means the realm owners have (at least some) 
influence over DNS. Given that assumption, if the failed URI lookup is 
problematic, they can just replace the SRV records with a URI record 
since they have influence over DNS.

This is overall a win for them anyway, since they have replaced two 
SRV record queries with one URI record query. So if performance is so 
critical to them that adding one failed lookup is a problem, they 
should be happy to make a small change and cut their number of queries 
in half (2 SRV => 1 URI).

My preference at this point is to just use one URI record named 
_kerberos.$REALM and make it the default (deprecating SRV).

> I am less concerned about adding an additional lookup to the case 
> where a realm doesn't exist.  That is, where we are currently doing 
> two failing SRV lookups and giving up, adding a third failed URI 
> lookup doesn't seem likely to cause trouble.
> 
> For unfortunate reasons, Kerberos realm KDC discovery can be 
> repeated multiple times during a single user operation, due to 
> fallbacks or other concerns.  In the past I investigated a 
> performance issue where ssh was taking three seconds to perform 96 
> DNS requests and 12 KDC requests to decide that a server didn't 
> support Kerberos.  (There were no timeouts, and each request was 
> satisfied pretty quickly.)  That led me to implement deferred 
> hostname lookups and remove some unnecessary realm traversal logic 
> in the TGS code.
> 
> Of course we could implement internal DNS caching, but that would 
> add a lot of complexity.  And of course the right answer is a local 
> caching DNS resolver with proper negative caching, but that doesn't 
> seem terribly common yet.
> 
> Nico's type=ANY approach is interesting, but concerning if it could 
> result in failures (especially timeouts) due to servers disabling 
> type=ANY queries.