[Solved] WRT1900ACV1 reboots: kernel 4.9

Thanks for the work on this @InkblotAdmirer Will try a new build tonight.

Edit: Does /arch/arm/boot/dts/armada/xp-mamba-linksys-mamba.dts have anything to do with this? It is also different also between 4.4 and 4.9. Or does the *dtsi override *dts?
Pardon my ignorance just trying to learn.
And thanks for the boot area I was in aiming
for arch/arm/mach-mvebu but saw no real issues there.

@northbound

The linksys-mamba.dts is configuration specific to the device, you have to follow the includes as well. the *.dtsi files are intended to be "platform" information with configuration applicable to anything using that platform, and then specific additions and changes are added on top with *.dts.

From a couple of posts elsewhere, my understanding is that other release images are not experiencing the reboot; McDebian comes to mind, don't remember seeing anything regarding dd-wrt. My assumption would be that everything based on this would behave the same. So unless the issue is being patched out on another image, we must be patching the issue in to our image. But maybe this is based on the false premise that only LEDE is experiencing the reboot issue. Perhaps someone who has run another release can offer an opinion on their experience.

@anomeone In that case it should just be a matter of going through the dozens of patches added when kernel 4.9 was added and when mvebu was added to 4.9. I'm not particularly into that drudgery.

With the patch above (virtually no change to dtb) mamba rebooted after just under 48 hours.

Yep, I hear ya. I have experienced reboot time variance from less than an hour, to over 12 days, on the same image. So, even if the intersection of patches that involve the mamba in any manner reduces the count somewhat, without a hint as to the cause, it is still a long painful road; and that is assuming the premise of my previous post is valid.

I had hoped we might have had another avenue to debug this when I saw the kexec commits being made, but last I looked it did not appear to be ready for use on this target.

Fresh build today. Had serial console connected and several wifi connections. Spontaneous reboot and zero/nada/zilch on console.

From target/linux/generic/hack-4.9 I have removed all patches except: 230,251,259,700,910,911,921,930
From pending-4.9 I have removed 120,600,610,611,612,613,616,630,701,703,734,735,810,811,812
From backport-4.9 I have removed 020,021,022,023-1thru7,030-01,030-02,050

Everything builds fine and I see no loss of performance or functionality -- but there are still reboots.

The remaining patches either don't apply to mvebu (they are arc, mips, etc), they are integrated into other features (kobject-uevent), or I don't quite understand what's trying to be accomplished with the patch so I don't want to just blindly remove it.

If McDebian really isn't rebooting with a straight debian kernel either one of the latter two categories of remaining patches is the culprit or some other service not used in that implementation is broken (procd, busybox ntpd, etc). I looked through the Debian source patches and nothing stands out as "fixing" 4.9 for mamba.

I'm guessing we somehow need to trap the condition causing the reboot to get the necessary insight to fix this. Any ideas?

1 Like

Yes, IMO best to spend effort on getting the CRASH_DUMP facility happening on this platform.

to be taken with a big grain of NaCl. Just from random empirical observation, my current suspicion is directed at mvneta, perhaps one of the upstream backports in support of offload units. But I have absolutely no direct evidence to support this idea. The ability to catch an event is sorely needed.

My last build with 4.9.47 ran for 15 days without any reboot, but decided to load a new image today with the kernel update.

ping. @anomeome did you manage to find anything? I've been hearing that for the past 4 kernel updates to 4.9 there hasn't been reboots reported. Any updates would be appreciated.

Looks like they bailed on us. We do need a working crash dump.
Edit: Sorry for the edits.

The following task is now closed:

FS#888 - WRT1900ACv1 random reboots since kernel 4.9
User who did this - Ted Hess (thess)

Reason for closing: Deferred
More information can be found at the following URL:
https://bugs.lede-project.org/index.php?do=details&task_id=888

watahoot, there's a category for closing an issue that I did not have in my dev days. There was some banter on IRC 5-6 weeks back about getting the dump facility working for this platform, but I have not seen anything since.

@thagabe, I'm not optimistic enough to think the issue was resolved by coincidence, but... Including the 4.9.47 image mentioned above, I have built 4 images, each driven by a kernel push, and I have not had a reboot from any of those images. But, given the nature of the issue and the way it manifests itself, this is of little comfort. I looked in the obvious places each time to see if anything appeared to be a fix, but did not notice anything.

IMO there are going to have to be some voices raised to get some eyes on the issue. In light of this, if it is not addressed, the device could get orphaned.

I gave up on it and bought a WRT1900ACS. I figured everyone had done the same.

I requested closure about a month ago, might be that Ted Thess just got to it. Anyway, I have the option to re-open if anyone wants it back. Not sure if I'm the only one with that option?

The original FS#564 opened by @northbound is still open. It maybe needs a title change as it does not show up in a search using obvious key words, which is why I missed it back at the start, and I assume why 888 was opened.

I too have the ACM which is working "great". However that is no reason to abandon mamba. Furthermore this issue is not present on non-lede firmware so maybe focusing on the upcoming kernel update to 4.14 patchset would be smarter than having to track down the issue affecting 4.9 (much like it was done back when 3.14 was the initial release then 3.16 was skipped bc of slow network issues and 3.18 was selected as the stable kernel). 4.14 appears to be the kernel of choice atm as it upstream many components as well as rango and will have lts for 6 years.

1 Like

Sorry I guess I should have been be more explicit.
But this sux...Those in the know should know better than me....I admit I am a peon, But I am a peon that spent buku hrs trying to get to the bottom of this. @thagabe good point I guess I should try again. And not give up yet and stick with 4.4 which is solid. I do like to move forward.

You assume correctly. I've requested it be reopened.

I got confirmation kernel 4.14 will be the next kernel pushed to master. We should probably start debugging for this kernel.

My devices consistently reboot with 4.9. I have tested more recent kernels (4.9.51) and the issue is still there. I have even tested versions with CESA and BM disabled, to no avail.

Between 4.9 and 4.4 there were changes in ARM multiprocessor concurrency, which would be my next guess (barring any crash logs).

Did we found a way to mitigate the issue or V1 users will have to stick to 4.4 and custom builds for a while ?

By the way, it look like a hard lock-up. No logs being sent to logging server and nothing on the serial console (uboot just appears magically out of nowhere).

We? You?

To my knowledge, the problem is so far unsolved.

It looks like there aren't any core developers with a V1 and so nobody knowledgeable enough really looks for the problem. I have seen no movement regarding that for several months.