[mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT

Keith Winstein keithw at MIT.EDU
Sun Dec 30 12:31:46 EST 2012


Hello Axel,

Thanks for the detailed report. "Last reply" means that the _server_ is 
not getting (or at least not acknowledging) packets from the _client_. (If 
the client were not getting packets at all, it would say "Last contact.")

So the client-side tcpdump is somewhat as expected. Are you able to send a 
similar tcpdump from the server side? I hope that might help resolve the 
mystery.

I do see from this tcpdump that the mosh-client is doing its port-hopping 
and the proximal (client-side) NAT correctly responded to at least one 
port hop (from 55013 to 53665). I'm curious what it looks like from the 
other end of the "distal" NAT.

Cheers,
Keith

On Sun, 30 Dec 2012, Axel Beckert wrote:

> Hi Quentin,
>
> On Sun, Dec 30, 2012 at 11:22:42AM -0500, Quentin Smith wrote:
>> On Sun, 30 Dec 2012, Axel Beckert wrote:
>>> Any idea what could have cause such a bad lockup in a mosh connection?
>>> IIRC as of now, mosh does DNS lookups only once at start, so it
>>> couldn't be a cached bad DNS reply or such.
>>
>> Just to check the obvious first - to your knowledge, the server
>> resolved to the same IP address before and after you restarted the
>> AP? (That is, the server didn't appear to move for any reason?)
>
> That would have meant a compromise of DNS servers in at least two
> domains. :-)
>
> Nevertheless, I checked all for machines where this happened and for
> two of them I know the IP addresses by mind and they didn't change.
> And the other three look at least familiar.
>
>> What version of mosh are you using? Mosh 1.2.3 adds a new behavior
>> where it will try opening a new connection on a new port if it
>> hasn't heard from the server in a while (I think 10 seconds?). This
>> is to work around some braindead NAT devices that have behavior
>> similar to what you're describing.
>
> It was always 1.2.3 (from the offical Debian Wheezy package) on the
> client side and 2x 1.2.3 (same package) on the server side and 3x
> 1.2.2 from Debian Backports on the server side.
>
>>> I still kept one of the non-responding mosh sessions open (now at
>>> 21521 seconds), so I can possibly debug that session.
>>
>> If you capture a tcpdump on both the client and server for ~30
>> seconds or so, that should show conclusively if your network is
>> eating the packets, or the client or server is confused.
>
> Looks like the latter. I captured this on the client side:
>
> # tcpdump -i wlan0 host 78.46.73.201
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on wlan0, link-type EN10MB (Ethernet), capture size 65535 bytes
> 18:08:38.183852 IP xenlink.noone.org.60001 > c-crosser.local.55013: UDP, length 116
> 18:08:38.295591 IP c-crosser.local.55013 > xenlink.noone.org.60001: UDP, length 71
> 18:08:39.143872 IP xenlink.noone.org.60001 > c-crosser.local.55013: UDP, length 133
> 18:08:39.253217 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 82
> 18:08:40.064163 IP xenlink.noone.org.60001 > c-crosser.local.55013: UDP, length 132
> 18:08:40.172792 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 78
> 18:08:40.865203 IP xenlink.noone.org.60001 > c-crosser.local.55013: UDP, length 133
> 18:08:40.973333 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 77
> 18:08:41.785033 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 124
> 18:08:41.894189 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 69
> 18:08:41.963862 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 114
> 18:08:42.074045 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 81
> 18:08:42.725320 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 121
> 18:08:42.837176 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 79
> 18:08:44.046276 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 124
> 18:08:44.162170 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 81
> 18:08:44.665467 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 114
> 18:08:44.774038 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 83
> 18:08:45.983868 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 126
> 18:08:46.094821 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 81
> 18:08:46.602694 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 127
> 18:08:46.716342 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 75
> 18:08:47.544721 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 130
> 18:08:47.651795 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 76
> 18:08:49.003503 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 120
> 18:08:49.110928 IP c-crosser.local.53665 > xenlink.noone.org.60001: UDP, length 78
> 18:08:49.746999 IP xenlink.noone.org.60001 > c-crosser.local.53665: UDP, length 129
> 18:08:49.855992 IP c-crosser.local.45626 > xenlink.noone.org.60001: UDP, length 69
>
> Nevertheless, the top line of that mosh session (and I only have one
> to that host) still says "mosh: Last reply 26184 seconds ago. [To
> quit: Ctrl-^ .]"
>
> 		Kind regards, Axel
> -- 
> /~\  Plain Text Ribbon Campaign                   | Axel Beckert
> \ /  Say No to HTML in E-Mail and News            | abe at deuxchevaux.org  (Mail)
> X   See http://www.asciiribbon.org/              | abe at noone.org (Mail+Jabber)
> / \  I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)
> _______________________________________________
> mosh-devel mailing list
> mosh-devel at mit.edu
> http://mailman.mit.edu/mailman/listinfo/mosh-devel
>



More information about the mosh-devel mailing list