Shaping performance

dlakelan · May 15, 2018, 2:06pm

I thought of this too, but there are a few issues. First off, who do we ping? Second how to collect this data and synchronize the clock? I think a second process that runs a ping script and collects ping to separate file. No need for crazy large ping, one every 500ms is prob fine.

Ultimately I agree with you, we need a more nuanced model. I don't think we need more than one set of measurements per contributor though. Multiple contributors with different bandwidth avail will naturally give us data sufficient for a good model.

dlakelan · May 15, 2018, 3:14pm

I think the full ping binary is probably best if we want to go that route. The command:

echo "" > /tmp/pings.txt
for i in 1 2 3 4 5 6 7 8 9 10 11 12; do
   echo DATE=$(date +%s%N) >> /tmp/pings.txt
   ping -i 0.5 -c 10 IP-ADDR-HERE >> /tmp/pings.txt 
done

Would probably work fine. I don't know how reliable the 500ms timer is here, which is why I let it ping 10 times and then timestamp. If we run this command in the background while we collect the stats it should help understand the latency. As I say the big issue is which ip address to ping. I'm inclined to do 8.8.8.8 since the google DNS responds to ping and is typically "close" to everyone.

moeller0 · May 15, 2018, 4:47pm

smart, I would have gone and use sleep to manually send each ping with a fine timestamp something along the lines of:

	echo "${i_sweep}. repetition of ping size ${i_size}"
	ping -c 1 -s $16 ${TARGET} >> ${LOG} &
	# we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
	sleep ${PINGPERIOD}

Yes one instance per ping is somewhat costly, but this way we have a precise timestamp for each ping. I believe there is some disagreement between different ping binaries what happens if "-i 0.5" happens before the currently in flight probe has returned or is timed out, that the one instance per probe approach nicely avoids.
But I also like your idea of doing this with a lower frequency than per-ping

Playing with a recent iputils ping I noticed that the -D switch is documented to:
" -D Print timestamp (unix time + microseconds as in gettimeofday) before each line."
e.g.:

user@work-horse:~$ ping -c 1 -s 16 8.8.8.8 -D
PING 8.8.8.8 (8.8.8.8) 16(44) bytes of data.
[1526402175.825987] 24 bytes from 8.8.8.8: icmp_seq=1 ttl=61 time=23.9 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 23.954/23.954/23.954/0.000 ms

This might actually be sufficient to just use:
PING_HZ=5
TESTDURATION_SECS=30
ping -i 0.2 -c 100000000 -s 16 8.8.8.8 -D >> /tmp/pinglog.txt &
ping_pid=$!

followed by
kill -9 $ping_pid
after the measurements are done, this should actually work not too badly assuming the file does not get too large for /tmp

I agree that using googles CDN infrastructure is a sane idea, so 8.8.8.8 or gstatic.com should both work (even though neither guarantees to be a reliable ICMP reflector) well enough.

I guess the ping script should live in its own file, let me see whether I find to to look at it later this week (@richb-hanover's betterspeedtes.sh has a few nice ways to run ping in the background I would like to "steal" from).

dlakelan · May 15, 2018, 5:05pm

I like the microsecond timestamp idea with -D. I think high res more than 0.2 seconds is a mistake, don't want too much cpu going to writing pings to a file... but 0.2 is ok to start.

we can just do:

ping -i 0.2 -c 300 -s 16 -D 8.8.8.8 > /tmp/pinglog.txt&

Also, since we've committed to m4 as a postprocessor, I'm going to rewrite the data collection to use macros directly in the file... like for example for timestamps:

date +"nstimestamp(%s,%N)"

then we can avoid maintaining a sed script to massage the data into a m4 processable file and the m4 file can be simpler.

Also I'm going to put the first thing in the file as something like:

fileformat(1)
nstimestamp0(123,123)

So we can keep track if we change file formats later, and also so we can find the initial seconds and better deal with subtracting the start time to make the times fit in a float, it also makes the m4 macros easier to understand.

moeller0 · May 15, 2018, 6:50pm

After "opgk update ; opkg install iputils-ping"
root@router:~# ping -D -c 2 -i 0.2 8.8.8.8 | grep -e "["
[1526407061.913834] 64 bytes from 8.8.8.8: icmp_req=1 ttl=62 time=24.3 ms
[1526407062.114492] 64 bytes from 8.8.8.8: icmp_req=2 ttl=62 time=24.1 ms

on OpenWrt SNAPSHOT, r6835-e495a05069, so the -D switch seems to be a real option

dlakelan · May 15, 2018, 8:46pm

Ok, I made various changes related to what we discussed above and pushed new versions to github. It now collects the stats file and the pings file, as well as various CPU info. the jsonify scripts now work with the updated format. So far no attempt to munge and parse the pings file or the router info file, just collecting it.

I did find out that my router goes completely CPU bound at around 800Mbps upload, but not download... since I am running squid on the router... it's hard to know exactly why, and I'm not doing SQM just custom scripts... but I think it's a proof of concept that you can detect useful stuff with this method

dlakelan · May 15, 2018, 11:48pm

Indeed, squid is using up a lot of "usr" cpu time pushing packets. Why it takes more CPU to push up than down I don't know, possibly it's just that my download is currently slightly slower than upload. Or it might be that on the downlink my router isn't the bottleneck, whereas on the uplink it is, so more time needs to be spent actually shaping things, and with the squid also doing its thing, it runs out of CPU.

shm0 · May 16, 2018, 2:32am

Sorry if this has been mentioned before but i didnt read the entire thread x)

did someone try to make use of ifb's numtxqs options and does it make any difference?

//edit
i think this only works with multi queue aware qdics.

dlakelan · May 16, 2018, 3:05am

That should help for multi core systems, maybe. Definitely worth trying but I don't think anyone has yet.

dlakelan · May 16, 2018, 5:28am

I changed my squid to use only 3 workers (on a 4 core router), leaving 1 core for processing softirq, and slightly slowed my upload setting to 780Mbps and now I'm getting better ping stability and overall pretty good results. Top says about 8-13% idle during a speedtest. I'll run the data collection script and see how that affected things tomorrow.

dlakelan · May 22, 2018, 3:53am

Anyone had a chance to try data collection and analysis? I'm pretty happy with the data that's collected but I think it needs some examples to figure out how best to extract an estimated max speed.

BIGFAT · May 22, 2018, 4:08am

Im going to try it out once at home (in good 11 hours or so )

moeller0 · May 22, 2018, 1:19pm

Oops,

the (long) weekend was spent mostly off-line, but I will try to run the newest version of the collector script soon, both on the wndr3700v2 as well as on the Turris Omnia (which is getting a bit long in the tooth, but being mvebu should pack more of a punch than the ancient wndr3700v2).

lesandie · May 22, 2018, 7:51pm

Done! how can i send it to you?

dlakelan · May 23, 2018, 3:31am

Add it to my google drive folder: https://drive.google.com/drive/folders/1v_S3oFhLEIq49ShKMxjZkgvBQK8IP9ko?usp=sharing

I'll leave it writable for the moment.

moeller0 · May 23, 2018, 6:35am

@dlakelan, I created a pull request versus your github repository, that basicallt re-shuffles a few deck-chairs (sorry, could not help it) and also introduces checking for our required binary versions that should give instructions how to install things on openwrt to help casual users.

I would still like to consolidate all output files into a folder that is uniquely named so that users can collect multiple runs, and I want to collect the tc qdisc statistics from the start and the end of the script to get better insight into the actually instantiated shaper. And maybe create a compressed archive of the output folder to a) save space and b) make it more convenient to move things off the router (at least my wndr3700v2 is so tight on space that I would not like to collect multiple uncompressed instances in /tmp)

Let me know what you think...

dlakelan · May 23, 2018, 4:27pm

Now you're making it look like a real shell script

I'd like to make it work on systems other than OpenWRT/LEDE and SQM, so I think it's fine to add specific checks and helpful stuff for OpenWRT/LEDE but as you already did, let's continue to make sure OS specific stuff is not specifically required.

I agree that dumping the qdisc info before and after could be helpful. But this also potentially leaks information. Specifically if someone has a custom qos script ... Perhaps we can ask the user for permission early on in the script.

I also agree that putting the files into a folder makes good sense, and then tar.gz after collection.

Rather than having a user edit the file to get the duration, I like making it an argument, specifically I propose:

data_collect.sh [DURATION] [WANIF] [LANIF]

with some sensible defaults for those.

I have to do some other things, but will look into merging your stuff in the next few days and/or try to add these ideas.

moeller0 · May 23, 2018, 5:23pm

That is probably just my lack of "taste" showing through

100% agree the script should also be usable on non-openwrt linux distributions (I believe /proc/stat is linux-specific so this will not work well on macosx). Hence the explicit test for the required binaries and the conditional information about how to install those for openwrt/lede.

Good point; while I fail to see what could be leaked here I believe asking for permission and instructing the user to read/check before uploading is the right thing to do.

Great, I might have a go at that, time permitting.

100% agreed, at least for the duration, the interfaces probably are interesting for later analysis but should not really affect the actual data collection (I also believe that the sqm/qos configs and the tc -s output should reveal enough information to deduce LAN and WAN interfaces; now I had not considered that simply asking might be much easier at least as long as the user knows).

Sure, take your time

moeller0 · May 26, 2018, 10:22am

So I added this, but instead of asking permission in the script the now recommends to screen the collected data files before uploading them anywhere.

I implemented the folder idea, but did not do the compression as that will make the "screen output for sensitive data" harder.

I went for a slightly different user interface:
USAGE_STRING="Usage: sh data_collect.sh -W WAN_interface_name -L LAN_interface_name [-4 -6] [ -d duration ] [ -p host-to-ping ]"
as I intensely dislike purely positional parameters (as a carefree user I easily get these wrong and debugging gets iffy).

Sure, I take no pride in the implementation, I really just wanted to offer something more productive than a purely verbal feature request, so I humbly offer a prototype that seems to work and was mildly debugged.

Ah, finally I implemented using IPv6 for the ICMP/ping data collection as I assume on native IPv6/ds-lite links this should be better suited for our intentions than going through tunneled IPv4...

dlakelan · May 26, 2018, 12:19pm

Cool, those are all great improvements. I'm usually lazy about implementing the nonpositional stuff in a language as ugly as shell... I mean what kind of language doesn't have hash tables and linked lists

Anyway, we've got a long weekend, and a soccer tournament for the kids so I will probably first look at this tuesday.

In the mean time, maybe I will post some thoughts on modeling / predicting the performance in between soccer games

and yes, I prefer football but Americans have this other weird game called that.