[Solved] Zyxel NBG6817 flashing from OEM

Just to clarify, I do know how to toggle the bootflag from a running/ accessible system, but there must be a way to toggle it (via button presses?) in the non-booting case.

Just flashed my Zyxel NBG6817 aswell, it was flashed on latest ABCS.7. First, I flashed to 17.01.4, which lead to the issues listed in this thread. Afterwards, I've build from latest source only using the following flags:

CONFIG_TARGET_ipq806x=y
CONFIG_TARGET_ipq806x_DEVICE_NBG6817=y
CONFIG_TARGET_BOARD="ipq806x"
CONFIG_PACKAGE_kmod-fs-autofs4=y
CONFIG_PACKAGE_kmod-fs-ext4=y
CONFIG_PACKAGE_kmod-nls-utf8=y
CONFIG_PACKAGE_kmod-usb-storage=y

blockd and block-mount don't seem to be hard requirements. Also, in the latest snapshot, kmod-usb2 seems to be activated by default for this device. Settings are saved across reboots. Wireless 2.4GHz and 5GHz also working.

A few things I've noticed so far: I get "Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2". Seems to be WARNING level. Also, LEDs for 2.4GHz and 5GHz on the front are amber instead of white (original firmware). Couldn't find a way to change them to white, as the amber color bothers me.

1 Like

It transparently falls back to /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin

So I wasn't hallucinating, I have never been sure if they were white or amber in the OEM firmware, but at the same time never bothered to reboot into the OEM firmware (I've only run it for a few minutes to check if the hardware were functional) for confirmation. It shouldn't be too different to change the LED colour, you probably just need to determine the correct GPIO to use and then change the device tree file accordingly, with a little luck the correct GPIO might already be revealed somewhere in the OEM firmware.

It might be interesting to check "cat /sys/kernel/debug/gpio" and compare it between OEM (with white LEDs lit up) firmware (amber), potential candidates for closer examination could be:

gpio2   : in  0 2mA pull down
gpio4   : in  0 2mA pull down
gpio5   : in  0 2mA pull down
gpio7   : in  0 2mA pull down
gpio8   : in  0 2mA pull down
gpio10  : out 1 12mA no pull
gpio11  : out 1 12mA no pull
gpio22  : in  0 2mA pull down
gpio23  : in  0 2mA pull down
gpio24  : in  0 2mA pull down
gpio25  : in  0 2mA pull down
gpio34  : in  0 2mA pull down
gpio35  : in  0 2mA pull down
gpio36  : in  0 2mA pull down
gpio37  : in  0 2mA pull down
gpio38  : in  2 10mA pull up
gpio39  : in  2 10mA pull up
gpio40  : in  2 10mA pull up
gpio41  : in  2 10mA pull up
gpio42  : in  2 16mA no pull
gpio43  : in  2 10mA pull up
gpio44  : in  2 10mA pull up
gpio45  : in  2 10mA pull up
gpio46  : in  2 10mA pull up
gpio47  : in  2 10mA pull up
gpio49  : in  0 2mA pull down
gpio50  : in  0 2mA pull down
gpio53  : in  0 2mA pull down
gpio55  : in  0 2mA pull down
gpio56  : in  0 2mA pull down
gpio57  : in  0 2mA pull down
gpio58  : in  0 2mA pull down
gpio63  : in  0 2mA pull down
gpio66  : in  0 2mA pull down
gpio67  : in  0 2mA pull down
gpio68  : in  0 2mA pull down

It might be interesting to check “cat /sys/kernel/debug/gpio” and compare it between OEM (with white LEDs lit up) firmware (amber)

Interesting approach, I will track it down!

Unfortunately, the latest snapshot didn't run well on my Zyxel. As soon as I replaced my main router, WiFi crashed after a few hours of operation. Can't say anything too specific, because the log size wasn't big enough, but it looked like kernel oops caused by WiFi. Either way: I've setup my old router again and now got plenty to time to track down issues on the Zyxel.

Comparing the output between LED on (wlan configured) and LED off (wlan off) using the OEM firmware should be even easier to check.

cat /sys/kernel/debug/gpio on OEM firmware didn't list all GPIOs, like on LEDE. White LEDs turned on or off, the GPIO output stayed the same. They don't seem to be listed there. I've activated amber LED using OEM led_ctrl utility (just script for /sys/class/leds) and I was indeed getting white + amber LED activated at the same time. Unfortunately, /sys/class/leds only lists the amber LED, no traces about the white LED so far.

 gpio-0   (mdio                ) in  hi
 gpio-1   (mdc                 ) out lo
 gpio-3   (rst_n               ) out hi
 gpio-9   (POWER               ) out hi
 gpio-26  (WiFi_5G             ) out lo
 gpio-33  (WiFi_2G             ) out lo
 gpio-38  (sdc1_dat_7          ) in  hi
 gpio-39  (sdc1_dat_6          ) in  hi
 gpio-40  (sdc1_dat_3          ) in  hi
 gpio-41  (sdc1_dat_2          ) in  hi
 gpio-42  (sdc1_clk            ) in  lo
 gpio-43  (sdc1_dat_1          ) in  hi
 gpio-44  (sdc1_dat_0          ) in  hi
 gpio-45  (sdc1_cmd            ) in  hi
 gpio-46  (sdc1_dat_5          ) in  hi
 gpio-47  (sdc1_dat_4          ) in  hi
 gpio-48  (rst_n               ) out hi
 gpio-53  (WLAN_DISABLE        ) in  lo
 gpio-54  (RESET               ) in  hi
 gpio-61  (UHS_mode            ) in  lo
 gpio-64  (INTERNET            ) out lo
 gpio-65  (WPS                 ) in  hi

Sidenote: WLAN_DISABLE is on gpio-53 (I've verified it on OEM), opposed to current LEDE implementation using gpio-6. Indeed: WLAN_DISABLE button doesn't work on current LEDE. Easy fix I guess.

//Edit: rmmod leds-gpio on OEM firmware disabled POWER and INTERNET LED, but 2.4G and 5G.

//Edit 2: white WiFi LEDs on OEM firmware are controlled by zyxel_led_ctrl utility, which uses iwpriv wifi[0|1] gpio_config 17 <status> 0 0 whereas status = 1 means disabled and 0 enabled. wifi0 is 5G and wifi1 is 2.4G.

//Edit 3: Okay, pinned down the issue. It's the same issue, which also prevents Netgear R7800 from using it's native WiFi LEDs (the R7800 guys are atm using other, rarely used LEDs instead). White 5G and 2.4G LEDs are not controlled by SoC GPIO, but Qualcomm Atheros QCA9984 GPIO (phy0 and phy1). For both devices, the Zyxel NBG6817 and the Netgear R7800, gpio-17 per PHY GPIO is used. Fixing one, will also fix the other. From what I've seen so far, it looks like it needs to be implemented in ath10k. I haven't found a way to make any ath10k GPIO writable from userspace so far - that would solve our issue though. I've got my infos from Netgear R7800 OEM firmware source and Zyxel OEM script.

That explains the situation - and puts the nbg6817 into a comparatively good situation (LEDs are usable, 'just' the wrong colour).

I've received a very interesting E-Mail from a TP-Link Archer C2600 user (nwfilardo on ath10k mailing list), featuring a method to toggle the LEDs on Qualcomm QCA9980, which also works on Qualcomm QCA9984 (NBG6817, R7800).

QCA9984 is connected via PCI, so he used a tool to gain read / write access to PCI memory registers. In Archer C2600 u-boot source code, it was revealed that GPIO 17 can be activated by setting Bit 17 on address 0x85018, which makes address 0x85000 Bit 17 usable as active-low controller for driving the LEDs.

Didn't had alot of time to look into ath10k so far, but it looks like ath10k_pci_reg_write32 or atleast ath10k_pci_write32 should do the job for exposing LEDs to sysfs. The TP-Link C2600 and Netgear R7800 communities are probably interested in this aswell.

nwfilardo posted a bash script to toggle LEDs (pcimem required), which you can try out, if you're interested: https://www.mail-archive.com/ath10k@lists.infradead.org/msg07443.html.

1 Like

These pull requests should improve support for the NBG6817.

1 Like

I have tried using fstab to mount /dev/mmcblk0p10 on /overlay or on / during boot (pivot overlay and pivot root), but none of them succeed. They both mount back to /dev/loop0
It's possible to mount it by typing mount /dev/mmcblk0p10 /overlay, but like Kostja mentioned, this won't survive the reboot.
Any help or ideas will be appreciated.

@mushr00m as stated by a few posters, installing kmod-fs-ext4 fixes this issue. @slh already posted a patch, which will hopefully get merged soon, so this will be an issue of the past. Until then, you need to dive in and build your own image e.g. using Image Builder, integrating ext4 drivers. While you're at it, you may also add other useful packages aswell. Please note, that Image Builder doesn't include luci per default. A few suggested configs are listed in this thread.

Latest LEDE snapshot still crashes after a few hours of operation, even with custom board-2.bin. Not sure, if this is a temperature issue or something else, as I've found nothing suspicious in the logs - it just reboots after a few hours.

Other than WiFi, this thing is a beast. Using SQM (layer_cake.qos), my bufferbloat went down to 7ms - 14ms on a 100/40 line. After enabling irqbalance (banned IRQ 99/eth0 & 100/eth1 and manually pinned them to CPU0 & CPU1), bufferbloat went down to very impressive 2ms - 7ms (!).

I can also confirm the recovery method posted by @Kostja_V working for recovery, aswell as overwriting LEDE with OEM firmware.

To have less issues with flashing LEDE, I think it should be best practice to run printf "\xff" >/dev/mtdblock6 on OEM firmware, reboot and then flash LEDE. This makes sure, you're on the correct partition, which eliminates the "I've flashed LEDE, but I'm still booting into OEM" issue. Also, if LEDE has issues booting after the initial flash (e.g. bootloop), try holding RESET for 10 - 15 seconds. Fixed a bootloop issue for me.

Just be careful to only write to mtdblock6 from the OEM/ vendor firmware, it's mtdblock11 from LEDE (different mtd partitioning)! Please do read the in-depth explanation of the dual-boot flag before overwriting anything crucial (and be extra careful).

Hi, @tolga9009. I'm not linux guru, can you please share how did you ban IRQ 99/eth0 and 100/eth1 and manually pinned them to CPU0 and 1 ? Also why did you do this ? I know what irqbalance does, but didn't know there is some need for fiddling on nbg6817.
Thanks :slight_smile:

Not entirely sure, whether IRQ pinning makes a difference or not, but irqbalance vs no-irqbalance made a huge difference in my environment.

My thought about IRQ pinning was: when eth0 / eth1 is not processed by the same CPU all the time, interprocess communication may slow it down. I read it somewhere on the net and it didn't seem to hurt.

cat /proc/interrupts reveals eth0 / eth1 IRQs. As irqbalance distributes IRQs more or less "randomly", you tell irqbalance to not touch eth0 / eth1 IRQs (IRQ 99 and IRQ 100 in my case) by using following options: irqbalance --banirq=99 --banirq=100. You can then manually pin IRQs to CPUs: echo 1 > /proc/irq/99/smp_affinity and echo 2 > /proc/irq/100/smp_affinity. Again, no idea if pinned vs unpinned makes a difference, but it's definitely not a requirement.

So I just obtained this router and every attempt I've tried to install LEDE has bootlooped the router. I've tried both the latest snapshot and 17.01.4. Is there something I'm doing wrong? I'm just DDing the two files to mmcblk0p4 and mmcblk0p5 respectively, and I made sure the bootflag on /dev/mtdblock6 is 0xFF.

Prefer snapshots over 17.01.4 until a potential 17.01.5 (or 18.x.y) gets released.

The installation process from the OEM firmware is:

copy lede-ipq806x-NBG6817-squashfs-mmcblk0p4-kernel.bin and lede-ipq806x-NBG6817-squashfs-mmcblk0p5-rootfs.bin to /tmp/ of your router (e.g. by using wget directly or scp'ing the firmware)

# printf "\xff" >/dev/mtdblock6   #warning, only do this from the OEM firmware!
# cat /tmp/lede-ipq806x-NBG6817-squashfs-mmcblk0p4-kernel.bin >/dev/mmcblk0p4
# cat /tmp/lede-ipq806x-NBG6817-squashfs-mmcblk0p5-rootfs.bin >/dev/mmcblk0p5
# sync
# reboot -f

A self compiled ~10 days old snapshot is working fine for me.

Strange, that's what I was doing with the latest snapshot, both with dd and cat. I'll have to try it again I guess D:

I've written a simple script which can set the bootflag safely on both LEDE and the ZyXEL OEM firmware.

I've also started to document the ZyXEL NBG6817 in the OpenWrt wiki, as I conclude that @tmomas prefers device specific information to be maintained there (cf. Xiaomi WiFi Router 3G):

https://wiki.openwrt.org/toh/zyxel/nbg6817

(I'm not quite sure how to represent 4 MiB SPI-NOR flash && 4 GiB eMMC in the device table though).

I've now implemented full dual-boot support for nbg6817 in this pull request (pending):

https://github.com/openwrt/openwrt/pull/670

This means LEDE will from then on always flash to the other, currently inactive, partition set. Both partition sets keep their own overlay, so you can toggle between different LEDE versions, each with their own configuration.

Safe options to toggle between the dualflag are using the afforementioned nbg6817-dualboot shell script (which also works from on the OEM firmware) via ssh or by installing and using luci-app-advanced-reboot for a nice integration into LEDE's webinterface.

1 Like

Can I install the firmware now without need of compiling it by myself?