EDIT:
I see that at line 194 there is dst_xfrm
I'll try your git commit kernel: avoid flow offload for connections with xfrm on the dst entry (should fix IPSec)
EDIT2: didn't paid attention the problem is actually at 197
EDIT3: mwan3 was using Local source interface br-lan that made the networking really unstable, after applying the patch and changing the Local source interface to none, for now, no kernel oops
You should probably mail your patch to the mailing list for better visibility or by creating a pull request. Or perhaps @nbd might be willing to have a look here Thank you very much for the patch by the way!
Hi we have a problem with enabled flow offload and wireguard on mt7621. I hope you can solve this issue and not a bug in wireguard.
To reproduce the bug enable flow offload and then try to transfer data through wireguard, then thhe router will reboot instantly. i couldn't get log from this. Bu somebody did some debuging and shared a call trace here
However, when I restart my device, the rule no longer appears and I have to re-add it. I am new to iptables so I may be doing something wrong, but has anyone else seen/solved this?
I could add it to my startup scripts, but I would think saving it should keep it across reboots.
Device: WRT3200ACM (mvebu cortex a9)
Snapshot: Around 4/26 (not near my router right now so I can't pull the exact commit/build), from trunk
It is working great though, CPU's around 10% when maxing out my connection at 130mbps down.
So I have been thinking. When benchmarking my Dir-860l by connecting a PC to the WAN port and LAN port and running an iperf3 test, I can see the following results:
-WAN <-> LAN: 940 mbit (same speed as 2 devices on the same switch, hence running into gigabit ethernet limitations)
-WAN <-> LAN with SQM: 650-700 mbit
However, with my real connection I am using a PPPoE connection on the WAN side instead of IPoE. This is giving me the following speed:
-WAN <-> LAN: 500 mbit down (my connection speed), 400 mbit up (100 mbit below my connection speed) and very low CPU idle %, showing I am CPU limited.
-WAN <-> LAN: 350-400 mbit in either direction with SQM enabled.
However, when enabling hw flow offload I am seeing:
-WAN <-> LAN: 500 mbit in either direction (99% CPU idle)
-WAN <-> LAN: N/A. Not possible to use SQM with hw flow offloading
Conclusions:
My dir-860l is able to shape 650-700 mbit
PPPoE is a severe bottleneck, unless hw flow offload is enabled
But hw flow offload doesn't work with SQM
Would it be possible to:
Apply hw flow offload on WAN <-> dummy interface. This makes sure the CPU intensive PPPoE is fully offloaded.
Process dummy interface <-> LAN without hw flow offload, but keep software flow offload enabled. Should be fine to do this in software given my synthetic benchmarks and given that part 1) hardly costs any CPU cycles.
Apply SQM on the dummy interface.
So basically, I have 3 questions:
Would this be possible?
If so, how do I apply hw flow offloading to some parts, but not all. And use software flow offloading for another part?
How do I make the traffic follow: WAN <-> dummy interface <-> LAN.
I am running into another bug with hw flow offload it seems. Could anyone please verify/deny whether they are seeing the same thing?:
Usually, when nothing is really happening on my home network, there is around ~100-200 active connections as can be seen in the Luci overview page. However, with hw flow offload enabled I started seeing thousands of active connections. Diving into the connections tab in Luci's Realtime Graphs, I could see hundreds of connections made by a computer that shutdown over 12 hours ago. For some reason, inactive connections are not timing out and left in the conntrack table. Disabling hw flow offload fixes the issue.
Sounds familiar for anyone else? @nbd is there any additional information that I can provide to help debugging the issue?
Edit: Running the 18.06 branch from a few days ago: OpenWrt 18.06-SNAPSHOT r6917-8948a78