Optimized build for the D-Link DIR-860L

Perfect, thank you very much for the clarification. So the kmod-nft-offload module isn't needed? What is that one used for? Just my curiosity :slight_smile:

kmod-nft-offload is for nftables

Aha, perfect. Thank you very much for the explanation. Compile is running now. Really curious what kind of differences in CPU usage I will see on my 500/500 mbit connection. Thank you very much for your splendid work :slight_smile:

Compilation was successful. However, I have never tried to use iptables directly, so I am completely clueless how to add the flow offload rule to iptables. I could try playing around I guess, but I prefer not to make a big mistake and opening up the firewall by accident. Anyone that knows how to safely add this rule to the forward chain?

You can run this command:
iptables -I FORWARD 1 -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD

1 Like

Is there something I can do to verify it is working? I am not seeing any differences speed-wise. Still stuck at ~500/380 speeds on a 500/500 connection. CPU usage might be down slightly. 45% idle during the full download speeds, 35% idle during the bottlenecked upload test.

Could I simply be bottlenecked by PPPoE? htop is showing my first thread being pegged at 100% CPU usage. The other 3 threads have enough processing power to spare. Not sure how well PPPoE is multithreaded.

How are you measuring your speed?

www.dslreports.com

With the ISP provided router I am consistently getting 500/500 as expected.

You could delete the rule again with this line:
iptables -D FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD
Try to switch between offloaded and not offloaded a few times and see if CPU load or the measured throughput changes.

1 Like

There's definitely a higher CPU usage with the rule deleted versus the rule enabled. 20-25% cpu idle during download and upload versus 45% idle during download and 35% during upload. But speed are identical with it enabled versus disabled. 500/380 in either case.

I'm really suspecting PPPoE to be the culprit here, since even before offloading the Dir-860l was able to do gigabit speeds WAN <-> LAN with masquerading and firewall enabled when I tested this on a local network with 2 computers directly connected through ethernet. Is PPPoE encapsulating single threaded by any chance? Or maybe it is only spawning 2 threads (since the DIR-860l has 2 physical cores) while it should be spawning 4 threads (since they are hyperthreaded)? Is there any way I can view the CPU usage of the PPPoE encapsulation and decapsulation? Or is that impossible since it runs in kernelspace?

1 Like

If you don't mind breaking the internet for a little while you can try plugging in the wan port to a server with dhcp server on a different subnet. Then do an iperf so that you get a measurement without pppoe complication for comparison?

1 Like

This is what I did when I first got the Dir-860l and it was able to do gigabit speeds at around 35-40% idle CPU. Mind you, this was months ago, way before this offload stuff even existed.

Apparently PPPoE is a major bottleneck. I'm really unsure why PPPoE seems to be bottlenecking my connection, even though there's quite a bit of idle CPU % to spare. It doesn't seem to be properly multithreaded for 4 threads that the Dir-860l is using. Or maybe the PPPoE connection is using compression and/or encryption that is slowing down the connection. I'll have to see if I can get some debug output to see if any encryption/compression is being used.

Is it possible to leave a comment on specific commits? During my playing around with the new flowoffload stuff, I noticed severely degraded Wifi performance on my Dir-860l. I first tried a stock master branch build thinking it might be caused by kernel 4.14, but even on 4.9 the issue was still there. There was one other commit that might be strongly related to Wifi performance, which is this one:

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=c8d07575e5bca43966651bd5b9b3f8c22bbe43ca

For comparison sake, I also compiled an image with head at one commit earlier, namely this one:

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=981cca12b6ce59781d59bda2b42b4ed36f4e37eb

The results are quite staggering. Before Minstrel commmit:

And after Minstrel commit:

These 2 screenshots are simply 2 snapshots from either situation, but I made sure these results were reproducible by running the speedtest 10 times on both commits. Tests were done on a Laptop very close to the AP on the 2.4 ghz wifi. I think commenting on that specific commit would probably be the easiest way to bring this to @nbd 's attention.

Edit: Found it. I will leave a comment at the Github page for that specific commit :slight_smile:

Thanks for tracking this down. Could you please try deleting the minstrel patches one by one (in reverse order) to figure out which one breaks it?

How do I do that exactly? I am familiar with checking out commits through Git, but I am unsure how to go through individual patches on by one. Am I correct in assuming:

  1. I checkout the Minstrel commit again
  2. I delete patch #329
  3. I delete patch #328
  4. I delete patch #327
  5. I delete patch #326
  6. During steps 2) through 5) I compile and test to see if it restores performance

Is there anything else I need to take into account? Can I simply run make again, or will I have to do a make clean before each compilation?

That's correct. You can simply run make again, no need for make clean

Removing patch 327 fixed the issue for me. So to reiterate: Only after removing patch 329, 328 and 327 performance was restored to old levels. I assume it isn't possible to only remove patch 327 to verify that it is the only culprit, since 328 and 329 probably depend on earlier patches being present?

1 Like

This is a bit unexpected. Did you switch back and forth a few times between a build with 327 and one without, just to make sure it's not something else acting up?

No I did not. I did test each build multiple times before concluding anything. I flashed via a Sysupgrade image while keeping the settings between each flash.

Thanks. I will let you know when I have something new for you to test.