Netgear R7800 exploration (IPQ8065, QCA9984)

@chunkeey yes that's what I've meant by suggesting symlinking the board.bin back then. There ought to be a simpler solution

Edit: it's the caldata for 5ghz that gets symlink, not 2.4

Edit2: to summarize things up for qca9984, it uses:

  • board.bin symlinked to 5ghz caldata
  • board-2.bin downloaded from CAF or Kvalo's git
  • firmware itself from CAF or Kvalo's git
    when anything from this list is absent neither radio comes up

Edit3: there's newer board-2.bin available https://source.codeaurora.org/quic/qsdk/oss/firmware/ath10k-firmware/commit/ath10k/QCA40XX/hw1.0/board-2.bin?id=171b9607fb8cc694ed469a4e29c91af9d92f2971
Try it along with symlinking 5ghz pre-cal to board.bin

Okay the tsens driver works, all 11 sensors are visible but it shows temp in full degrees again, sigh...

Too bad. Sounds again like a hack by somebody, maybe something similar that we discarded a few months ago.

Ok, I assumed it was the 2.4GHz radio, since it is usually the first.
so thanks for clearing this up. BTW: I wrote mail to the ML to ask
about the QCA9984 oddity.
https://marc.info/?l=linux-wireless&m=149028769320374
(Next time, I'll add you to the CC: as well.)

As for the board-2.bin / board.bin:

The ath10k driver tries to locate the correct board data in the board-2.bin. If this fails,
it will fall back to the the board.bin. I think you could get away with deleting the
board-2.bin in your configuration and it will still work.

Note: ath10k doesn't do any auth/id checks for the board.bin. If it's there it will
be uploaded... If it works: great.

Hm, I've backed out the "wifi-breaking patch" and added your patch, and this does not seem to produce a wifi-not-broken build. Is there more to it than that? Here's the kernel output of the driver horking the firmware load:

[ 10.111180] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
[ 10.111721] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 r
eset_mode 0
[ 10.281986] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/pre-cal-
pci-0000:01:00.0.bin failed with error -2
[ 10.282037] ath10k_pci 0000:01:00.0: Falling back to user helper
[ 10.472881] firmware ath10k!pre-cal-pci-0000:01:00.0.bin: firmware_loading_st
ore: map pages failed
[ 10.473045] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/cal-pci-
0000:01:00.0.bin failed with error -2
[ 10.480835] ath10k_pci 0000:01:00.0: Falling back to user helper
[ 10.686545] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-5.bin failed with error -2
[ 10.686583] ath10k_pci 0000:01:00.0: Falling back to user helper
[ 10.717425] firmware ath10k!QCA9984!hw1.0!firmware-5.bin: firmware_loading_st
ore: map pages failed
[ 10.717600] ath10k_pci 0000:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-5.bin': -11
[ 10.725463] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-4.bin failed with error -2
[ 10.735356] ath10k_pci 0000:01:00.0: Falling back to user helper
[ 10.786290] firmware ath10k!QCA9984!hw1.0!firmware-4.bin: firmware_loading_st
ore: map pages failed
[ 10.786461] ath10k_pci 0000:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-4.bin': -11
[ 10.794328] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-3.bin failed with error -2
[ 10.804240] ath10k_pci 0000:01:00.0: Falling back to user helper
[ 10.859650] firmware ath10k!QCA9984!hw1.0!firmware-3.bin: firmware_loading_st
ore: map pages failed
[ 10.859773] ath10k_pci 0000:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-3.bin': -11
[ 10.867548] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-2.bin failed with error -2
[ 10.877560] ath10k_pci 0000:01:00.0: Falling back to user helper
[ 10.917004] firmware ath10k!QCA9984!hw1.0!firmware-2.bin: firmware_loading_st
ore: map pages failed
[ 10.917515] ath10k_pci 0000:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-2.bin': -11
[ 10.924939] ath10k_pci 0000:01:00.0: could not fetch firmware files (-11)
[ 10.934880] ath10k_pci 0000:01:00.0: could not probe fw (-11)
[ 10.942109] ath10k_pci 0001:01:00.0: enabling device (0140 -> 0142)
[ 10.947857] ath10k_pci 0001:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 r
eset_mode 0
[ 11.121669] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/pre-cal-
pci-0001:01:00.0.bin failed with error -2
[ 11.121712] ath10k_pci 0001:01:00.0: Falling back to user helper
[ 11.174919] firmware ath10k!pre-cal-pci-0001:01:00.0.bin: firmware_loading_st
ore: map pages failed
[ 11.175360] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/cal-pci-
0001:01:00.0.bin failed with error -2
[ 11.182914] ath10k_pci 0001:01:00.0: Falling back to user helper
[ 11.420601] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-5.bin failed with error -2
[ 11.420644] ath10k_pci 0001:01:00.0: Falling back to user helper
[ 11.467551] firmware ath10k!QCA9984!hw1.0!firmware-5.bin: firmware_loading_st
ore: map pages failed
[ 11.467741] ath10k_pci 0001:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-5.bin': -11
[ 11.475603] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-4.bin failed with error -2
[ 11.485496] ath10k_pci 0001:01:00.0: Falling back to user helper
[ 11.517657] firmware ath10k!QCA9984!hw1.0!firmware-4.bin: firmware_loading_st
ore: map pages failed
[ 11.517831] ath10k_pci 0001:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-4.bin': -11
[ 11.525702] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-3.bin failed with error -2
[ 11.535596] ath10k_pci 0001:01:00.0: Falling back to user helper
[ 11.585103] firmware ath10k!QCA9984!hw1.0!firmware-3.bin: firmware_loading_st
ore: map pages failed
[ 11.585220] ath10k_pci 0001:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-3.bin': -11
[ 11.593054] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/
hw1.0/firmware-2.bin failed with error -2
[ 11.603016] ath10k_pci 0001:01:00.0: Falling back to user helper
[ 11.633533] firmware ath10k!QCA9984!hw1.0!firmware-2.bin: firmware_loading_st
ore: map pages failed
[ 11.633647] ath10k_pci 0001:01:00.0: could not fetch firmware file 'ath10k/QC
A9984/hw1.0/firmware-2.bin': -11
[ 11.641491] ath10k_pci 0001:01:00.0: could not fetch firmware files (-11)
[ 11.651443] ath10k_pci 0001:01:00.0: could not probe fw (-11)

You should not do both. The patch in my PR is meant to restore functionality with the current git head. There is no need to back out the wifi breaking patch.

Hm, thanks. Just your patch didn't do it. I will see if my git fu has betrayed me.

@hnyman this driver is from Qualcomm SDK
I'm starting to think that without that rounding to full degrees we are getting incorrect temp +- 0.5C.
If you look through the code there is difference in code_to_degc function in tsens-ipq8064.c in my above commit and the one that is used at the moment in kernel tsens-8960.c (it's code_to_mdegc in this one).

If you check temp calculations you'll see that at first both drivers get similar data
(adc_code * s->slope) + s->offset;

but then ipq8064 version adds or substracts 500 (depending on conditions) and only then divides by 1000. So the difference is 500 millicelsius all the time.

Edit: by the way according to code, master sensor is sensor0, but we can't pull it with upstream driver cleanly. Upstream driver lets parse only sensor5-10, I guess it's sensor address range difference (seems that ipq sensors range is lower a bit).

[quote="dissent1, post:288, topic:285"]
but then ipq8064 version adds or substracts 500 (depending on conditions) and only then divides by 1000. So the difference is 500 millicelsius all the time.
[/quote]I did not yet check the source, but that sounds like a quite logical (and expected) correction against "round-down" in interger division calculations. That enables the millivalues 500-999 to round up.

55444 / 1000 = 55,
but 55666 / 1000 = 55 (wrong)

(55444+500) / 1000 = 55,
and (55666+500) / 1000 = 56 (right)

There's a bit different logic: if temp > 0 then + 500 all the time, if temp < 0 then - 500 all the time, so that's not rounding thing
Edit: or maybe you are right and that has been the reason

[quote="dissent1, post:290, topic:285, full:true"]
There's a bit different logic: if temp > 0 then + 500 all the time, if temp < 0 then - 500 all the time, so that's not rounding thing Edit: or maybe you are right and that has been the reason
[/quote]Sounds like I am right. That rule matches perfectly the needed rounding logic to counter the "always round toward 0" of integer divisions.

Integer division always truncates (or "rounds down toward 0"), so a value like 55900 that you would like to see as 56, will be 55 unless you pad it before the division. The needed padding is divisor/2, in case of divisor 1000 the needed correction is 500. So 55900+500 = 56400 that divides to 56.
On the negative side the same:
-44333 / 1000 = -44
-44666 / 1000 = -44 (wrong)
(-44333-500) / 1000 = -44
(-44666-500) / 1000 = -45 (right)

The truncation nature of integer division is a sneaky thing that has caused trouble for many programmers, as it is easy to overlook. Nice to see that Qualcomm has got it right.

I looked at the new source code and looks like they have done the millicelsius vs. celsius conversion pretty nicely. At the first glance it looks like changing TSENS_FACTOR from 1000 to 1 would keep everything in the driver as millicelcius. The whole driver operates with millicelsius and then there are two conversion functions to get to/from full celsius for the input/output functions.

I think that one approach for maintaining the expected millicelcius output is to the set TSENS_FACTOR 1 and then change also the limits in DTS to be millicelsius, e.g. instead of 95 have 95000.

Thanks for clarification, it does seem correct explanation, I'm still all new to this.

I'll try to clean up all that factor multiplicator, gotta check other defines 1st.

I just did a fresh checkout and a clean build, and the atheros firmware loads (with your single patch). I don't know what is different between the old tree and the new—git didn't think there was a difference. Sigh. :slight_smile:

Could be somehow dirty .config file. It is easy to leave unnecessary old settings & package selections there.

I re-build .config from scratch for each build. (I use a short recipe file of ~120 lines that contains all my actual package selections & settings, which file I can explode to full .config with "make defconfig".)

Good example is today: QCA988X firmware was finally dropped away from ipq806x default packages with commit e3c88f496 . My rebuilt-from-scratch .config caused the new R7800 image to be 256 kB smaller than yesterday's image, as the unnecessary QCA988X firmware blob was left out from the R7800 image. But if you continue with old .config without pruning/rebuilding it, you will never notice that change, as the once selected QCA988X firmware blob remains selected for ever.

Ah. Do not want. Thanks for pointing that out.

I've cleaned up factor stuff, also had to adjust THRESHOLD_MAX_CODE

https://github.com/dissent1/r7800/commit/e77cda673152570737c2e2d99b1f17b33b8df582

I wonder if that's a typo in degC_to_code function variable degC is with capital C

Btw it works ok at first glance all 11 sensors are in mdegrees

I took your two commits and it compiled nicely. All 11 sensors are visible and reacted nicely to load caused by openssl benchmark.

One suggestion:
I would keep the changes in the millicelcius patch more minimal. You could leave the whole contents of code_to_degC and degC_to_code functions unchanged simply by defining TSENS_FACTOR 1 instead of deleting the whole definition.

@hnyman, @ianchi -- I'm having the "no wireless" issue on the (same ipq806x platform) Linksys EA8500 with the latest snapshots (the last snapshot with working WiFi was r57, revision numbers have been on and off over the last week). Just tried revision 3867 and there are no wireless devices in /sys/class/net/ even tho kmod-ath and kmod-ath10k along with ath10k-firmware-qca99x0 are present.

Is that an issue with the switch to 4.9 kernel? Is it something I can fix with the image builder? Does it need to be fixed by the device maintainer or is it outside the scope of single device?

PS. wifi config doesn't produce neither output nor /etc/config/wireless.