Netgear R7800 exploration (IPQ8065, QCA9984)

actually only in master.
the Stable Release builds is fine.

ok, got it

Is this 520 Mbit the best speed on 5G that can be achieved or it may be even better? (on any firmware)

I have a question regarding a performance issue I am seeing, but which I don't know how to investigate.
In my setup I have a regular WAN connection for internet traffic, which provides 300/300 Mbit/s. In addition I have an IPTV connection on a different physical interface. Usually the STB will be connected directly to this interface, where it receives an IP from a DCHP server, and then uses IGMP to subscribe to the TV channels. However, I want to move the STB to my LAN instead (for good reason, but I'm trying to keep this post short(ish)), so I am using a VLAN capable switch to provide a connection from the IPTV interface to the router. This VLAN is then received at one of the switch ports on the router, and for all intents and purposes acts like a WAN dedicated for IPTV.

The STB gets an IP (in my LAN) from the router, and I've set up appropriate forwardings and routes to make this work. However, there is an issue with performance, because when I load the WAN connection with Speedtest, there is some stuttering on the TV, indicating that the multicast stream is being disturbed,

The TV multicast is probably somewhere around 5-10 Mbit/s, while the WAN is around 300 Mbit/s, and the hardware should be more than capable to handle this without any issues. Now, for the IPTV eth0 isn't involved at all, but for the internet both eth1 and eth0 are involved. The computer and the STB are both in the LAN, but the Speedtest is obviously using eth0 for internet connectivity.

If I copy some large files between a NAS and my computer (both in LAN) I see no issues, but that's (I guess) because it isn't a routed connection (the switch handles that in hardware). However, in the first test, both connections are routed, so obviously loads the CPU. How do I investigate this to find out where the bottleneck is?

I've tested irqbalance, and it does balance out the irqs quite nicely, and possibly lessens the issue somewhat, but not nearly enough.

Try this to see if it's a cpu issue

Ok, I'll try that! That should work also for stable, right? Just yet another newbie question: I will need to manually download first the initial commit as a patch, and then the four additional commits mentioned in the comments? And then manually copy kernel patches since I'm on stable?

No, it's a complete commit. You just need to copy patches if you are on stable 17.01.

Ok, I download it as a patch:
https://patch-diff.githubusercontent.com/raw/lede-project/source/pull/1269.patch

However, it doesn't apply because of the different kernel version, so it asks which file to patch:

@debianx64:~/OwrtLEDE-stable/lede1701$ patch --dry-run -p 1 -i 1269.patch 
checking file package/kernel/shortcut-fe/Makefile
checking file package/kernel/shortcut-fe/src/Kconfig
checking file package/kernel/shortcut-fe/src/Makefile
checking file package/kernel/shortcut-fe/src/README
checking file package/kernel/shortcut-fe/src/fast-classifier.c
checking file package/kernel/shortcut-fe/src/fast-classifier.h
checking file package/kernel/shortcut-fe/src/nl_classifier_test.c
checking file package/kernel/shortcut-fe/src/sfe.h
checking file package/kernel/shortcut-fe/src/sfe_backport.h
checking file package/kernel/shortcut-fe/src/sfe_cm.c
checking file package/kernel/shortcut-fe/src/sfe_cm.h
checking file package/kernel/shortcut-fe/src/sfe_ipv4.c
checking file package/kernel/shortcut-fe/src/sfe_ipv6.c
checking file package/kernel/shortcut-fe/src/userspace_example.c
checking file target/linux/generic/config-4.4
Hunk #1 succeeded at 2670 (offset -12 lines).
Hunk #2 succeeded at 3638 (offset -13 lines).
can't find file to patch at input line 11298
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/target/linux/generic/config-4.9 b/target/linux/generic/config-4.9
|index 24bbbc05878..1b1df923ee3 100644
|--- a/target/linux/generic/config-4.9
|+++ b/target/linux/generic/config-4.9
--------------------------
File to patch: ./target/linux/generic/config-4.4
checking file ./target/linux/generic/config-4.4
Hunk #1 succeeded at 2670 (offset -281 lines).
Hunk #2 FAILED at 3982.
1 out of 2 hunks FAILED
checking file target/linux/generic/hack-4.4/950-net-patch-linux-kernel-to-support-shortcut-fe.patch
checking file target/linux/generic/hack-4.4/951-bridge-Add-new-bridge-APIs-needed-for-network-HW-acc.patch
checking file target/linux/generic/hack-4.4/952-net-conntrack-events-support-multiple-registrant.patch
checking file target/linux/generic/hack-4.9/950-net-patch-linux-kernel-to-support-shortcut-fe.patch
checking file target/linux/generic/hack-4.9/951-bridge-Add-new-bridge-APIs-needed-for-network-HW-acc.patch
checking file target/linux/generic/hack-4.9/952-net-conntrack-events-support-multiple-registrant.patch

I tried giving it the config-4.4 file, but it doesn't apply cleanly.

The only option is to apply it manually then :slight_smile:

Ok. I'll try. But judging from the output above, am I right in assuming that the only file I will (hopefully) need to manually patch is the config-4.4 file?

Yeah, seems so

Unfortunately, it doesn't compile for me:

/lede1701/build_dir/target-arm_cortex-a15+neon-vfpv4_musl-1.1.16_eabi/linux-ipq806x/shortcut-fe/sfe_ipv4.c:1366:5: error: 'struct sk_buff' has no member named 'fast_forwarded'

Does that look familiar?

Edit: I see that this error is because the patches in hack-4.4 hasn't been applied. I didn't manually copy the patches to the patches-4.4 directory, which I should have. I'll leave this post here anyway, even if it shows my utter lack of understanding... :frowning:

Edit2: It doesn't seem to affect my performance issue. If anything, it actually got worse (but that could be just a coincidence, though, as I didn't test much). Ah well, thanks for suggesting it anyway.

@avx @mroek @steom
You might be interested to test reverting the ath10k buffer reduction that was done in March in master. That might help with performance issues.

The background is that the ath10k buffer size reduction was introduced a bit sneakily into ipq806x with a commit improving support for QCA4019. (The commit title talks about QCA4019 but does not mention that ath10k buffers get reduced for all chips):
https://git.lede-project.org/?p=source.git;a=commit;h=cc189c0b7fa015978b04bb663a75b1da726376b5

I tried to initiate discussion about that action later, but that got no traction as there was no real proof that the buffer reduction caused harm in a significant way. If there would be proof, the action might hopefully be retracted.

I have made a R7800 test build from the current master that reverts the ath10k buffer size reductions:

Downloadable from my build's dir:

  • revert buffer size: lede-r4694-e7373e489d-20170811-ath10k-buffer-test
  • normal : lede-r4694-e7373e489d-20170811

Ps. If anybody wants to try the same in his own master build, it is just about deleting these two patches that were introduced by that commit:

package/kernel/mac80211/patches/960-0010-ath10k-limit-htt-rx-ring-size.patch
package/kernel/mac80211/patches/960-0011-ath10k-limit-pci-buffer-size.patch

I'll test it some time during the weekend, but I'm skeptical as to whether it will fix the issues. In my case, even just making changes to the 5GHz wifi settings would randomly crash the router completely (causing it to reboot). The buffer changes would most likely only affect stability while doing transfers, and shouldn't matter much when just poking around in the settings.

I couldn't wait, so I tested it just now. Bad news though, performance on wifi is still abysmal. I did the same test as before, and download speed was 20-30 Mbit/s on 5 GHz wifi. Upload speed was actually quite OK (better than before, and on par with stable), but just one time. I repeated the test, but when upload was about to start, something went wrong. The router didn't crash, but the phone lost wifi connectivity and the upload was aborted. The log had this:

Fri Aug 11 21:51:33 2017 kern.warn kernel: [ 273.360252] ath10k_pci 0000:01:00.0: rx ring became corrupted: -5

So as far as I'm concerned, wifi is useless in master, both with and without those two patches.

I posted a new thread about the multicast performance issues I'm seeing, and I would appreciate it if anyone could help me diagnose that issue. Everything is now working correctly (after I fixed the bug with the query messages), except for the performance issue where the router either drops or reorders the multicast UDP packets.

Hi,
I have installed latest hnyman build r4694 with virtually all default settings and then scanned my system in the Shields Up service
https://www.grc.com/x/ne.dll?bh0bkyd2
And I got following results

NO PORTS were found to be OPEN. Ports found to be STEALTH were: 25, 80, 135, 137, 138, 139, 445, 543 Other than what is listed above, all ports are CLOSED. TruStealth: FAILED - NOT all tested ports were STEALTH, - NO unsolicited packets were received, - A PING REPLY (ICMP Echo) WAS RECEIVED.

Please advice, is this state safe enough or I should to close or hide those ports according to their recommendations?

Just follow this:

1 Like

This has nothing to do with R7800, but with firewall in general. So, wrong discussion thread...

You already have all ports closed (or dropping traffic). No traffic gets through.

You might read wiki discussion about the stealth "DROP" or closed "REJECT":
https://lede-project.org/docs/user-guide/firewall_configuration#implications_of_drop_vs_reject

@hnyman
Hi, when you upload new builds in your dropbox, where can I see what was changed compared with previous version?

Is it in *-status.txt file?

Usually there are no changes from me, but just the global changes in main sources and feeds like Luci and packages. You need to check the changelings in those repos.

1 Like