Qualcomm Fast Path For LEDE

If there is a difference, its most likely caused by

I've also added missing function to update udp statistics by
fast-classifier and some more small fixes.

The udp statistics thing seems to be unlikely since the main traffic load in your case was originating from the torrent TCP traffic. Gotta be some of the extra fixes.
I compared the main implementation files:

https://gist.github.com/MartB/214613e499a9c2364ee761ec4d67cbbb sfe_cm.c
https://gist.github.com/MartB/fb2ec15a253f8460809973f381c0ff00 fast-classifier.c

Does not look like the increase in irqs is due to the patching might be some kernel patch.
I will dig into it.

Edit:
I checked most of the important stuff but it does not seem to be different in most places. (Besides gwlim removing code and dissent using #ifdef)
Are you sure you had his stuff compiled properly ?

As per your instruction, only sfe and f-c were included and not sfe-cm.

Also note that gwlim's version requires all the 3 modules not just fast-classifier, otherwise htop % value goes up high, that means no acceleration.

I can't, I haven't built your version with sfe-cm.

[quote="dissent1, post:480, topic:4582, full:true"]And btw, running sqm within the tunnel (pppoe) is a bit different story.
[/quote]
Well, I don't know anything about this, unfortunately this is what I have :slight_smile:

And as a last note: thanks all of you for your work!

If you are talking to me then: yes, I'm sure they are all proper. :slight_smile:
The main diff between the 2 (but I didn't go through the code changes) is the non-usage/usage of shortcut-fe-cm module.
And note, the test didn't run through VPN but normal connection.

Yeah well you are only supposed to use shortcut-fe-cm or the fast classifier anyway so thats no issue then.
See my edit above for the stuff i found during briefly checking the patches (not much that warrants such a performance decrease)

Not with gwlim's version, you need all the 3:

@chros
You might be onto something.


Also inserts all modules.

Edit: As dissent pointed out in his commit message that indicates that shortcut-fe-cm always comes first if both are selected so maybe theres an issue with the fast-classifier only ?
Can you test it with dissents version but with only the shortcut-fe-cm enabled ?

I'm "glad" to hear that. Thanks for reviewing!

With gwlim's version, if I rmmod fast-classifier (only shortcut_fe_cm remains), htop reports double load. If I insmod it back then all goes back to normal. Maybe it helps.
If I'll have time today I test all combination with dissent1's version.

Yes, as I’ve pointed out sfe-cm is a bit faster because it lacks some checks and has more simple logic of offloading rules creation, that may be the reason. Other than that f-c supports bridge offloading.

Well you can’t rmmod the sfe-cm completely, it has leftover rules that it has created for SFE main module. You should not load it at all instead of rmmoding it. The same goes for f-c. You can check modules startup scripts in /etc/modules.d

Oh, well, anyway. :slight_smile:
I compiled fe-cm as well with f-c (with your SFE build), here are the results:
during the 20 mins test: down 8900 KB/s , up 1300 KB/s , connections ~70, htop values:

rmmod fast-classifier (only shortcut-fe-cm)
~92% !!!!!!!
even rmmod shortcut-fe-cm
~92%
insmod shortcut-fe-cm back
~92% !!!!!!!
insmod fast-classifier back
~55%
rmmod shortcut-fe-cm (only fast-classifier)
~92% !!!! :smiley:
rmmod fast-classifier , and insmod fast-classifier back
~55%

Hope it helps!

And I'd like to see your results as well, along with hardware, connection, etc. Even better if your router is supported by gwlim's aug builds, because you can easily compare the 2 (mips optimization doesn't really matter here).

Good thing about SQM is that it puts such a load on our router that is clearly visible.
You need to find out what your initial SQM settings are to get around 90% of sirq usage.
You have to fire up a torrent client with similar settings/usage between tests and looking at CPU % value in htop.
Note that the number of connections also matter not just the up/down speed (higher number of connection puts more load with same speed).

As I’ve said, you can’t rmmod, as it leads to a broken behavior.

When you are running my patchset, sfe doesn’t accelerate ingress flows flagged by QoS. When you are running gwlim’s patchset SQM ingress is not working, because packets are bypassed the QoS stack - SFE grabs it thus not letting qdisc rules to be applied. That could also be the reason.
Another reason - if you are torrenting - there are a lot of small flows appearing and disappearing, but f-c decides to offload only when 128 packets received (you can adjust it by echoing needed value into offload_at_packets).
When using gwlims patchset, basically you have only sfe-cm really working and f-c is hanging around (in the bar probably :slight_smile: )

1 Like

Can someone quickly help me get the concept of this ?

If i have a Wireless Access Point that uses a bridge interface between its wifi interfaces and the lan.
Will this be accelerated by SFE ?
For now it just displays:
size=0 offload=0 offload_no_match=0 offloaded=0 done=0 offl_dbg_msg_fail=0 done_dbg_msg_fail=0
and
NO_IIF = 2891 CT_DESTROY_MISS = 559 so that does mean it doesnt find the proper device ?

Summary

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
link/ether 80:2a:a8:c0:24:78 brd ff:ff:ff:ff:ff:ff
inet6 fe80::822a:a8ff:fec0:2478/64 scope link
valid_lft forever preferred_lft forever
3: ifb0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN qlen 32
link/ether f6:c1:51:b9:9f:e1 brd ff:ff:ff:ff:ff:ff
4: ifb1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN qlen 32
link/ether 0e:f8:51:22:50:e1 brd ff:ff:ff:ff:ff:ff
7: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 80:2a:a8:c0:24:78 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.5/24 brd 10.0.0.255 scope global br-lan
valid_lft forever preferred_lft forever
8: eth0.1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP qlen 1000
link/ether 80:2a:a8:c0:24:78 brd ff:ff:ff:ff:ff:ff
9: eth0.2@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 80:2a:a8:c0:24:78 brd ff:ff:ff:ff:ff:ff
inet6 fe80::822a:a8ff:fec0:2478/64 scope link
valid_lft forever preferred_lft forever
10: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br-lan state UP qlen 1000
link/ether 80:2a:a8:c2:24:78 brd ff:ff:ff:ff:ff:ff
inet6 fe80::822a:a8ff:fec2:2478/64 scope link
valid_lft forever preferred_lft forever
11: wlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP qlen 1000
link/ether 80:2a:a8:c1:24:78 brd ff:ff:ff:ff:ff:ff
inet6 fe80::822a:a8ff:fec1:2478/64 scope link
valid_lft forever preferred_lft forever
12: wlan1-1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP qlen 1000
link/ether 82:2a:a8:c1:24:78 brd ff:ff:ff:ff:ff:ff
inet6 fe80::802a:a8ff:fec1:2478/64 scope link
valid_lft forever preferred_lft forever

Also im getting the following output on my USG (unifi security gateway)
size=70 offload=0 offload_no_match=0 offloaded=2636 done=2488 offl_dbg_msg_fail=2636 done_dbg_msg_fail=2488

The two identical counts (2636) worry me a little bit.

CT_NO_CONFIRM = 54749
TCP_NOT_ASSURED = 49271
TCP_NOT_ESTABLISHED = 926
UNKNOW_PROTOCOL = 5
NO_SRC_DEV = 1800
NO_DEST_DEV = 1173803
WAIT_FOR_ACCELERATION = 996227
UPDATE_PROTOCOL_FAIL = 926
CT_DESTROY_MISS = 118532

You need to echo 1 > /sys/fast_classifier/skip_to_bridge_ingress

That’s ok, the 2nd value is that the debug message that the connection has been offloaded had failed to be sent to a debug daemon

Hey thanks for that so it should start working if i set that or wont it work at all ?

Yes that symbol enables incoming traffic on bridge interfaces to be offloaded

I keep getting NO_IIF errors and its still at 0 i must be doing sth wrong.

Could that be related to the interface being a switch with vlans for both ports ?

No_iif error also include locally generated packets and bridge egress, so it’s not a problem.
Bridge connections are not tracked, so it should not appear in those debug statistics. You can try measuring sirq. As far as I get it, bridge is already fast, so there won’t be such a noticeable gain.

A son of that WRT54G v2.0, WRT54GL v1.1 powered with Tomato RAF is able to NAT ~60 Mbit/s! :smiley:
I don't know RAF firmware is available to your WRT.

I am using gwlim's 32MB ram build for TP-Link 1043NDv1 and I have noticed a interesting WiFi behaviour when setting up several (in my scenario - 4) virtual Access Point SSIDs.
WiFi speed is different when connected the main AP SSID name (the one that is first on the list and has more configuration options available) and the other ones.The first is up to 2x faster then the other virtual APs.
Is it possible that the patch is not working on virtual Access Points but only on the main one?
Has anyone else tested similar scenarios with virtual AP on the same radio and encountered wifi speed differences?

Thanks in advance for your help.