[mosh-devel] Impressed

Fri Apr 13 08:47:34 EDT 2012

On Fri, Apr 13, 2012 at 5:09 AM, Hari Balakrishnan <hari at csail.mit.edu> wrote:
> Mosh gathers RTT samples and calculates a smoothed estimate using the same method as TCP.

What I wanted to do was abuse my network using the netperf tool
mentioned earlier, flooding the 3 (or all 4) hardware wifi queues,
switch between pfifo fast and the code I'm working on (sfqred and qfq)
and observe the interaction of the queues in the context of mosh's
rtts.

so some sort of way to simulate traffic and log rtts on both sides of
a mosh connection over the period of a test (along the lines of your
paper). I can probably just do it with iptables...

I do most of the latency-under-load testing I do with netperf's TCP_RR
test, which is rather unrealistic, the characteristics of interactive
non-tcp traffic are different, although I find comfort in cdf plots
showing wifi jitter staying within ever lower bounds...

http://www.teklibre.com/~d/bloat/qfq_vs_pfifo_fast_wireless_iwl_card_vs_cerowrt.pdf
(that's a ping vs iperf vs two qdiscs on wifi)

I hope one day I'll get around to accumulate enough repeatable data to
publish something worthwhile, but trends are encouraging....

I'd like to find a way of measuring 'the jitter that annoys' that was
nice, scientific, and repeatable.

> In your experience how commonly deployed is DiffServ at the edge hops?

Depends on the gear. For example, cisco (not their purchases, like
linksys) is big on it.
Elsewhere, it sort of depends on the era of the rollout.

Home gateways, nada.

I note that in this past year, we've helped push out CS6 marking in
quagga, dnsmasq, ahcp, babel, RA, etc, so more people will start
seeing it by default in ANT packets. Doing sopme classification to DNS
makes sense too, but as noted earlier I'm of two minds, I tend to
think for that stuff, saner packet aggregation makes more sense.

> Do you know whether 3G/LTE networks honor these codepoints?

I wish I did. I don't even have a grip on whether the majority problem
is going up from the handsets or down from the cells - if going up, we
can improve the handsets, if going down, the operators have to do
extensive reclassification on entry to their networks.

Figuring that out is on my very overlong, overburdened todo list.
Getting to where we had test tools and an OS that we could trust was
at the top. That's taken all year, and we're not done yet.... With
test tools in hand and widely distributed my hope is that more
researchers can take a stab at portions of the problem(s) that
interest them.

My principal observation of 3g/lte is that it's similar to wifi, but
can get 10-100x worse,
and exhibits enormous retry problems. See slide 3...

http://www.uknof.org.uk/uknof21/Taht-Bufferbloat.pdf

Classification isn't going to help fix that, the biggest parts of that
problem are bad queue management and 'every packet is sacred', with
tcp layered overtop.

One of the things on that todo list is to find someone to do a 'bloat'
app, measuring the critical underlying problems, but it's hard to get
at that level of the os in IOS/android's default stack. Just a basic
latency under load test would be good, at a high level.

Some of the critical values for wifi are simply not exposed by most
chipsets (and 3g/lte is even less transparent) even in the linux
mainline.

root at OpenWrt:/sys/kernel/debug/ieee80211/phy1/ath9k# cat xmit
Num-Tx-Queues: 10  tx-queues-setup: 0x10f poll-work-seen: 3823
                            BE         BK        VI        VO

MPDUs Queued:             7328          0        82       463
MPDUs Completed:          7328          0        82       463
MPDUs XRetried:              0          0         0         0
Aggregates:                 16          0        19         0
AMPDUs Queued HW:         1682          0       725         0
AMPDUs Queued SW:           33          0        62         0
AMPDUs Completed:         1638          0       713         0
AMPDUs Retried:           2660          0      2160         0
AMPDUs XRetried:            77          0        74         0

While I'm very interested in hearing about measurements and (massive)
improvements
to handling delays over lte/3g, ENOTIME. We're trying to get to an AQM
that works over ethernet and can maybe, possibly, actually work over
wildly variable bandwidths, but even with that theoretical
breakthrough...

the underlying wireless layers are very, very, very, very problematic.

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net