Here are some test results on the grandpa of them all... Linksys WRT54G v2.0, alive and kicking since year 2003, lucky one with 32MB RAM from factory
NAT speed, iperf with gigabit wired workstations on both sides:
OpenWRT 10.03.1 pushes ~27 Mbit/s, thats the last usable OpenWRT on these beasts, things went to drain after that
LEDE stable 17.01.2 branch, ~18 Mbit/s, I guess we can "thank" kernel developers for removing kernel cache, and adding general bloat every year
LEDE stable + @dissent1 PR applied makes it to ~32 Mbit/s, using fast_classifier/shortcut_fe combo. Looks promising, may give a second life to this box!
Btw, tested shortcut-fe-cm, no effect at all.
Also, is it necessary to hardcode IPV6 dependency for building these modules? Wastes precious 4MB flash
Well, quoting David Miller:
...the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables...
...Google sees hit rates on the order of only 10 percent...
...On simpler systems, cache is effective....
And guess what, in 99% of cases OpenWRT/LEDE is being run on a small, resource constrained devices, with a very SIMPLE end-user traffic pattern.
So obviosly, while particular change simplified kernel code, increased security and resistance to DoS attacks, and generally benefited enterpise usage scenarios, at the same time it catastrophically reduced routing performance where it matters a lot for majority of users: simple and predictable NAT'ed traffic flows on underpowered routers.
And I am certain the typical home user, like my aunt, will prefer being more easily DOSd then sacrificing 100-100*18/27 = 33.33 % of usable bandwidth... (To resolve the tension, my aunt does not care about either, so I believe outside of the enthusiast space people will simply shrug and potentially buy a more modern router (assuming that does not come already from the ISP))
Ah, come on leave the hyperbole for the campaign trail...
Care to share where your numbers were taken from (assuming it is SFW)?
How about getting an adequately powered router instead?
But... I think I understand your complaint, just thought your phrasing was a bit on the sarcastic side, that all, no harm intended...
Well, perhaps my wording was too strong, still 1/3 performance drop was a rather major regression. Time goes on, software gets more features, noone expects developers to hand-write assembly code and optimize every single line of code for speed and memory, but come on, can't cut that harsh.
I assume household edge devices constitute a clear majority of routers on Internet, seems obvious to me. Improving software on these devices without throwing them out like your average smartphone every 2 years would benefit everyone. And as we see there IS a room for improvement: latency (SQM/bufferbloat efforts), throughput (fastpath, enabling hardware accelerators) and security (frequent firmware updates with a mainline kernel instead of vendors prehistoric out-of-tree blobs).
Anyway, no offence indended and none taken, I definitely respect kernel developers for the work they do.
A few years ago, I discussed with an engineer who boasted that their team redeveloped part of the kernel to handle traffic using multi-processing and it was way faster. He was very vague, but I guess he was speaking about fastpath or similar projects. So IMHO any large company has its own forked project. Hust my 2 cents: fastpath is very promissing, but in the end hardware acceleration always wins.
I made a test build of master with Qualcomm Fastpath using @dissent1 patch.
On that build, nlbwmon (new netlink based per-host traffic stats app from @jow ) did not report ipv6 stats but seemed to report ipv4 normally.
New build from the same commit without fastpath, and nlbwmon again reports ipv6 stats.
Looks like fastpath may cause peculiar problems for netlink-related stuff.
(I had earlier noticed similar missing ipv6 data in nlbwmon with 17.01, so this might not be exactly about fastpath, but in general about netlink stats in some conditions.)
Yeah, I noticed the same thing - tbf, I'm not that bothered about the nlbwmon so removed it from my build based on your patches (I'm on a totally unmetered connection).
Fast Path is platform agnostic, it simply offloads processing of traffic with no complex rules out of the kernel networking stack and into a far more optimised path.
Simply apply the path in the pull request dissent1 has open against the main lede source on github, compile for your arch (ensuring you select the fastpath module in make menuconfig) and flash ... it's as simple as that.
So, I have not actually looked at the code, but...
I would expect that not using the kernel stack will force the user to give up a few of the bells and whistles the kernel stack offers. That might still be a decent trade-off, but saying Fast Path offers:
seems to imply the kernel stack would not also be optimzed (to some degree).
Anybody knows the chance of getting that module up-streamed into the kernel proper?
Anything too complex for simple offloading is still handled by the kernel. It's quite clever - it hooks into the network stack so it gets notified of any routing table changes, and anything it can handle it does, anything it can't it lets continue through the default stack.
And that's correct, the Linux Kernel Network Stack is a very generic networking stack that's meant to be used by a wide variety of different devices, if it was massively optimised we wouldn't need fast-path or hardware network acceleration.