Netgear R7800 exploration (IPQ8065, QCA9984)

That's definitely nand issues or smth in device tree not corresponding to new nand driver though I've updated it as well.
Backporting the nand driver into k4.4 leads to a boot loop.

i'll try that tree today

@blogic @Heinz has provided the bootlog on TP-Link C2600, a spi nor device
`U-Boot 2012.07 [Standard IPQ806X.LN,unknown] (Aug 28 2015 - 19:57:21)

smem ram ptable found: ver: 0 len: 5
DRAM: 491 MiB
PCI0 Link Intialized
PCI1 Link Intialized
SF: Detected MX25U25635F with page size 4 KiB, total 32 MiB
00:01.0 - 17cb:0101 - Bridge device
01:00.0 - 168c:0040 - Network controller
02:01.0 - 17cb:0101 - Bridge device
03:00.0 - 168c:0040 - Network controller
NAND: ipq_nand: unknown NAND device manufacturer: 0 device: 0
ipq_nand: failed to identify device
SF: Detected MX25U25635F with page size 4 KiB, total 32 MiB
ipq_spi: page_size: 0x100, sector_size: 0x1000, size: 0x2000000
32 MiB
MMC:
*** Warning - bad CRC, using default environment

In: serial
Out: serial
Err: serial
Net: MAC1 addr:0:3:7f:ba:db:1
athrs17_reg_init: complete
athrs17_vlan_config ...done
S17c init done
MAC2 addr:0:3:7f:ba:db:2
eth0, eth1
boot in 2 seconds
FirmwareRecovery: Now doing bootipq
MMC Device 0 not found
MMC Device 0 not found

Loading from nand1, offset 0x1f0000
Image Name: ARM LEDE Linux-4.9.10
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 1921894 Bytes = 1.8 MiB
Load Address: 42208000
Entry Point: 42208000
Automatic boot of image at addr 0x44000000 ...
Image Name: ARM LEDE Linux-4.9.10
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 1921894 Bytes = 1.8 MiB
Load Address: 42208000
Entry Point: 42208000
Verifying Checksum ... OK
Loading Kernel Image ... OK
OK
info: "mtdparts" not set
Using machid 0x1260 from environment

Starting kernel ...`

Seems not only a nand issue

does this also happen with the tree i pushed or after applying your patches or both trees ?

ok, this is related to your patches somehow. my board works with trunk but fails with your patches

There has been a leftover patch for ap148, I've deleted it now

I've got advised that it could be a serial driver or irq issue. But I'm not sure that any of patches can cause this.

Interesting thing is that your current commit is inconsistent with nand driver (patch 999 in current trunk)

https://github.com/lede-project/source/blob/master/target/linux/ipq806x/patches-4.9/999-dts.patch#L2194

It also reverts some nand DTS changes provided by 1st patches

dropping 166 makes my board boot. i am not too worried about nand or not, the actual CPE routers are of more importance.

tested with trunk:

Loading from nand1, offset 0x1f0000
Image Name: ARM LEDE Linux-4.9.10
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 1912462 Bytes = 1.8 MiB
Load Address: 42208000
Entry Point: 42208000
Automatic boot of image at addr 0x44000000 ...
Image Name: ARM LEDE Linux-4.9.10
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 1912462 Bytes = 1.8 MiB
Load Address: 42208000
Entry Point: 42208000
Verifying Checksum ... OK
Loading Kernel Image ... OK
OK
info: "mtdparts" not set
Using machid 0x1260 from environment

Starting kernel ...

So the problem seem to be not related to my patches. Something is missing...

it is realted to your patches, i tested a 4.4 kernel by accident. testing your series with 166 removed does not boot.

can you split your patches up please into my tree + yours ontop. then we can start to slowly add patches to my tree and see at which point it breaks

Heinz has tried clean trunk image and bootlog is the same, check post 150.

It's already on top of your tree, the differences are:

  • all spi patches are renamed to 7xx
  • all ipq40xx patches are intact
  • adm_dma patches are renamed to 155, 156, 157 from 0001, 0002 and 0003

Instead of your patch 999 I've made dts files-4.9 and brought back dts patches.

I'll list what driver patches I've added or split the commit into several.

ok, thanks for the info, i will try this in the evening again and see which patch makes it fail. we could also try to enable DEBUG_LL/ EARLY_PRINTK to see what the problem is

I've split the commit for easier application and testing
https://github.com/dissent1/r7800/commits/bl49-11

Thanks to @Heinz for providing another bootlog, but with those debug options on it is completely identical.
Could it be some clock/timer calibration failure?
Could it theoretically be because of CONFIG_ARCH_CLOCKSOURCE_DATA present in k4.9 config?

Might be this:
https://bugs.lede-project.org/index.php?do=details&task_id=542
ipq806x: unable to boot linux-4.9.10 uboot seeing Bad Magic Number

Loading from device 0: nand0 (offset 0x1340000)

** check kernel image **
   Verifying Checksum ... OK

** check rootfs image **

** Bad Magic Number 0x0 **

The same may be happening with R7800, as like I said earlier in this thread, with my attempts the 4.9 did not boot, but instead the R7800 was left in Netgear's TFTP recovery mode that is controlled by u-boot. So the control never passed to kernel. That happened with both @blogic patches in git master and earlier with patches from @dissent1

It's a bit different, he's lucking nand configuration at all, I've posted a msg in flyspray

[quote="dissent1, post:158, topic:285"]
t's a bit different, he's lucking nand configuration at all,
[/quote]How could missing nand drivers inside the firmware affect the checksum acceptance by u-boot that just looks at the image without running it?

You may be right that he is missing the nand drivers, but to me it looks like the boot fails much earlier than reaching those drivers.

I'm not sure u-boot can get the image without knowing where to look at physically :slight_smile:

Update: wonder if it may have such side effects
https://git.lede-project.org/?p=source.git;a=commit;h=7d00cfe9bb693e376ac9d035e13f8ce8a5ff572c