Qualcomm Fast Path For LEDE

Why do you think you need this for WRT1200AC? It should be powerful enough to do NAT at gigabit speed without FastPath. Start with measuring your router's capabilities first. They are posted in the links in the very beginning of this thread.

I'm seeing a strange behaviour, maybe someone else has also experienced this.

I've applied 1269.patch to the stable 17.01 lede branch, and built my image for a TL-WDR4300.
My connection has 500 Mbit download speed, pppoe connection.
I'm doing large rsync transfers (hundreds of GBs) from a remote linux server on wan to a local linux server on lan over ipv4. One rsync transfer usually takes hours to complete.

Without the fastpath patch, the router can handle around 250 Mbit/s speed, and CPU is maxed out at 100%.

With the patch, I can reach 500 Mbit/s speed with the router CPU only around 80%, but this goes on only for like 5 minutes, then speed drops back to 250 Mbit/s, and CPU goes up to 100%. Like if fastpath is not working any more.

If I stop the rsync process, and immediately restart it, speed is 500 Mbit/s at the beginning, but again only for a couple of minutes.

Does anyone have an explanation for this? Or a fix preferably?
Is it worth trying quarky's modified patch?

Thank you!

@duvi are you using fast-classifier or the sfe-cm? Should not use both together tho.

If using fast-classifier, you may want to make fast classifier kick in sooner. Default config will accelerate from the 129th packets for each established connection. For me, I overwrote the default to accelerate from the 5th packets onwards using the command below when you ssh into your router:

echo 4 > /sys/fast_classifier/offload_at_packet

The path may not be correct as I’m typing from memory.

See if that helps.

I'm using fast-classifier with sfe, I didn't build sfe-cm.
I tried your suggestion but it didn't really change anything.

Is there any way to turn on/off fast classifier without rebooting the router? Is "rmmod fast-classifier / insmod fast-classifier" enough?

it is enough
If you want load again fast-classifier type:

modprobe fast-classifier

Is there some kind of counter or buffer in fastpath?
While running the rsync process I can exactly see when the speed drops from 500 to around 250 Mbps. When this happens, I just do a "rmmod fast-classifier && sleep 5 && modprobe fast-classifier" on the router, and speed is back to 500 again, but again only for a couple of minutes. I'm not touching or restarting the rsync process, it keeps running on the server.
Maybe I should write a cronjob that does the rmmod / modprobe procedure on the router every 5 minutes...

From my limited understanding of the sfe codes, there're no buffers used for acceleration. I don't suppose you are using jumbo frames between your linux server and your router?

Interesting that sfe managed to work for minutes before stopping. Your rsync operation is over SSH between your local and remote server? If so, it should be TCP connection that sfe is accelerating. From the codes, there are about 12 scenarios that may cause an established accelerated connections to be removed from sfe's. I guess your rsync connection would have hit one of those 12 scenarios and fall back to netfilter stack. Unfortunately I have no idea which of the 12 as it was working for minutes before failing, which doesn't seem to make any sense to me atm.

How's the memory usage of your router when the issue seem to occur?

Ok, i got it up and running. Firstly i have to say "wow"... the performance increase is realy drastical. But there are point which i noticed is that this module has problems while testing the performance (speedtest) 6 tests and two failed (0kbit down, 0kbit up, 0ms ping) with this module... i had to restart the tests 2 times... does anyone also noticed this problem? is there a way to configure this module?

Device: Lantiq FB7360SL (VR9 - mips 34kc - VDSL2)
...
[ 13.611812] PPTP driver version 0.8.5
[ 13.700801] xt_time: kernel timezone is -0000
[ 13.754532] fast-classifier: starting up
[ 13.757541] fast-classifier: registered
[ 13.916612] PCI: Enabling device 0000:01:00.0 (0140 -> 0142)
[ 13.935018] ath: EEPROM regdomain: 0x8114
...

In addition i want to ask are there any darksides? in normal case performance costs stability and is there something like "documentation"?

Update: during the week I tested this from another site, not from within my home-network.
Result: does NOT work: I can connect to the NAS in my home network, but when I try to transfer a large file, the download start with normal speed (about 2 Mbytes/sec but after a few seconds the download becomes very slow: only several kbytes/sec. So it seems the shortcut-fe doesn't work once it start the offloading.
When I disable fast-classifier, the download is successful, at the normal speed.

I also have another patch in the sfe_ipv4.c file which bypassed acceleration for IPSEC traffic. Just remembered that patch as well. Try patching that file and see if it works.

Update: wait, I see that this function is already in the patch I've used. So, your bypass is not working?

Where can I find that patch?

sfe_ipv4.c is part of the entire SFE package. I've made a patch to it to stop accelerating IPSec UDP traffic. Try comparing the file you have against mine:

The changes are from line 2580 - 2594 in the file shown by the link above.

Hi @quarky :slight_smile:
I was looking on your patch. I think that's not enough because:
Probably it isn't problem with offloading IPsec packets - udp 500,4500, ESP (ip protocol 50, btw. your patch not included that). In my opinion the problem is in offloading packets going through the IPsec tunnel (inside the tunnel).

What worse, I considering if your patch don't disabling offloading packets in scenario when client inside local network establishing IPsec connection to remote vpn gateway. This is a different case from case with our tunnels terminated on our router.

In previous posts I was showing that I can see in /sys/fast_classifier/debug_info connections connection established through IPsec tunnel. Example below:

o=1, p=6 [b8:af:67:70:86:7f]:192.168.1.30:59609 192.168.0.3:80:[84:3d:c6:73:d4:58] m=00000000 h=128

This is connection inside IPsec tunnel which shouldn't be offloaded, but it is

If the problem was in offloading IPsec packets - udp 500,4500 and ESP it would be visable in Strongswan logs, tunnel should be disconnecting. But there wasn't such sytuation.

I have also TL-WDR4300 and I tested you scenario. I tested it by run iperf test through ssh connection to my remote server.
After 5 minutes I did not notice any degradation in speed. CPU usage was the same, at low level. Please check your connection in /sys/fast_classifier/debug_info. My looks as follows

root@OpenWrt:~# cat /sys/fast_classifier/debug_info | grep ":22"
o=1, p=6 [dc:a9:04:88:84:52]:192.168.0.64:49569 my_remote_server_IP here:22:[b8:af:67:70:86:7f] m=00000000 h=128

o=1 means that connection is offloaded. You can check this state before and after performance is degrading

Hi @jtaczanowski,

SFE is designed to bypass the netfilter stacks processing once it has been established (i.e. after netfilter has determined that the connection is allowed.) This is similar in design to the first rule you see in the INPUT netfilter rules:

-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT

The 'acceleration' is achieve by not going through the same netfilter checks for every incoming/outgoing network packets, as it is redundant, if it has been established previously to be safe.

So it doesn't really matter whether the traffic is going thru your router WAN link or through a tunnel to another location, SFE is supposed to 'accelerate' the connection.

I patched the fast-classifier code as the original codes does not work properly when it is used in conjunction with a not-physical network device and non-default routing table (in my case, an OpenVPN tunnel with Policy-Based-Routing.) The original code fails to find the correct output interface and when SFE kicks in, the network packets is discarded as it was sent to the wrong network interface.

I also encounter issues when using IPSec going through my router, whether it is directly via the router's WAN interface or via the OpenVPN tunnel. So I had to patch the IPv4 SFE code to bypass the IPSec traffic. AH traffic appears unaffected as far as I can tell. I guess it is because AH is done periodically, while IPSec traffic is more frequent and the SFE code does not track IPSec connection properly. Didn't spend too much time in this area tho. Planning to do it something in the future :stuck_out_tongue:

My router is accelerating (it's running my custom DD-WRT build with my patched SFE codes, as it is a Broadcom router) traffic via OpenVPN tunnels just fine. A sample of the connection that goes to the WAN port and another that goes thru the tunnel is shown below from my router:

o=1, p=17 [40:cb:c0:xx:xx:xx]:192.168.28.134:37905 x.x.x.x:37905:[00:00:0c:xx:xx:xx] m=00000000 h=751
o=1, p=17 [00:00:00:00:00:00]:192.168.27.154:37905 x.x.x.x:37905:[00:00:00:00:00:00] m=00000000 h=8088

Both connections are IPSec traffic connections. First line is from local router (192.168.28.0/24) to WAN, while second line is from remote site (192.168.27.0/24) via an OpenVPN tunnel to WAN. What I noticed is that when traffic goes thru a tunnel, the MAC address is always 00:00:00:00:00:00, which makes sense as a virtual interface does not have MAC addresses.

For your case, it looks like the SFE code is still routing traffic through a physical interface, which is incorrect. So when SFE kicks in, the network packet gets discarded. I'm not too familiar with IPSec traffic routing in the Linux kernel so I'm not sure if I can help in your problem.

Maybe someone in the forum can help shed some light into how IPSec traffic are routed and we can collectively figure out how to solve this problem.

An upstream version of offloading is ready in @nbd 's staging tree. More details can be found in his comment: https://github.com/lede-project/source/pull/1269#issuecomment-367056477

this is the commit
https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=commit;h=8d0c933b19dfa1f1fc38f685ca5925e0de7f83ce

That is one of the commits that is needed. There are quite a few commits before that that are also needed for this to work AFAIK. Simplest would be to simply clone his staging tree and compile from there.

I've installed the new image (Feb-2018) but SQM doesn't seem to work (again).
Speedtest and DSLreports results show full speed even though I throttle. Bufferbloat is awful.
Regular build works just fine.
Am I missing something?
(PPoE/VDSL)

Is this confirmed that mwan3 will not work with fast-path acceleration?
Didn't find more info in this thread.