[Solved] WRT1900ACV1 reboots: kernel 4.9

After up-time of close to 12 days I decided with the kernel push yesterday to build and flash a new image. I am beginning to think that whatever the issue may have been, it has been resolved.

@northbound, I do not see any of those messages in my log upon a boot with all of those security attributes enabled.

@anomeome That is caused by CONFIG_KERNEL_PROVE_LOCKING=y
I thought that may be why I was not doing random reboots I am on.4.9.18 now. I will remove that on the next build. I was trying too many changes at a time. :smile:

Builds from https://downloads.lede-project.org/snapshots/targets/mvebu/generic/ are still crashing/rebooting for me. I just tried r3883-2ebfdab and it rebooted while installing packages.

Yep, disappointing indeed. r3716-cd0f990 ran for close to 12 days with no random reboot, r3873-02fe942 rebooted last night after about 1 day.

I think it was just a hard lockup. It seemed to be 10 sec. then reboot. I think that why no crashlog
was created. I am beginning to think it may have been a kernel issue. Since I have been getting ahead of trunk the issue has gone away for me up to > Linux LEDE 4.9.20 #0 SMP Thu Mar 30 16:31:02 2017 armv7l GNU/Linux. All has been stable. Should this bug be closed?
https://bugs.lede-project.org/index.php?do=details&task_id=564

Edit: @anomeome The changes you had, caused 2 reboots in an Hr. So I backed them out.
I am not complaining just letting you know what happened here with your changes.

Edit2: Sorry forgot this part
Fri Mar 31 19:56:58 2017 kern.err kernel: [ 1.276788] cpu cpu1: opp_list_debug_create_link: Failed to create link
Fri Mar 31 19:56:58 2017 kern.err kernel: [ 1.283434] cpu cpu1: _add_opp_dev: Failed to register opp debugfs (-12)

This part was not addressed. Is this part of the old problem or not? Just curious.. Still works fine here

@northbound, I have run with seccomp and namespaces turned in in my build since a few issues were resolved a number of weeks back, and still on 4.4 kernel. The reboots started with 4.9, and as you did not have those attributes enabled, but were experiencing the reboot, I arrived at the conclusion they were not the issue; I wonder if @InkblotAdmirer was running with those. Also, as per my earlier post, I had a build run ~12 days without a reboot, updated to a new build (kernel update) and they started again. Same build on rango has no issues.

At any rate, instead of moving ahead with a kernel on LEDE, I have been taking a look at DSA, linux-next and other assortments on an owrt image based on @sera patchset on the mamba. Probably do another LEDE image on a kernel push.

I did not add seccomp or namespaces.

On 4.9.18 I had a mamba device up for ~4 days so when 4.9.20 was released I updated both mambas I own. One of them rebooted in < 24 hrs and since one of them is "mission critical" in my home network I reverted both back to 4.4 where uptime lasts the typical two weeks between firmware updates.

Since I have no compelling reason to upgrade from 4.4 on the mambas I doubt I will pursue this any further. It's a little annoying to have to build a separate kernel for the device but that's a small price to pay for stability.

Any ideas on why leaving 4.4 support is causing issues?

I don`t believe that it is, just think it is a matter of moving master(aka trunk) forward, and not having to support/update the 4.4 patch-set for this target. May be a positive though, in that it may mean more eyes on the issue. At any rate, you should be able to leave that commit out of your build tree and continue with k4.4 on a trunk build, at least until outdated patches catch you up.

I've reverted the removal of Linux 4.4 support. I currently don't have time to debug this myself, but please let me know if you guys make any progress in figuring out the root cause of this issue.

1 Like

Requesting update for this issue. 1900ac Version 1 users are still reporting reboot issues after installing latest lede snapshots with 4.9.x Kernels.

It's important we get V1 hardware fixed due to security issues running kernel 4.4.x.

1 Like

Still happens here. I may be up for 3 or 4 days then reboot twice in a few minutes no rhyme or reason.
4.9.27 and mwlwifi 10.3.4.0-20170512. And as usual no crash.log.

David, what security issues are you worried about? Kernel 4.4 is still maintained (linux foundation, not LEDE) and you can submit patches (or just apply locally) to be pulled into the tree if the devs don't update fast enough -- nbd has made sure kernel 4.4 is still an option for mvebu.

IMO this is a nuisance for anyone building their own -- you just have to do two independent builds.

The bigger issue is that LEDE releases (or anything from the buildbot) are not really a viable option for the AC V1.

@davidc502 I'm on 4.4.67 for ar71xx, x86_64 and mt7621. I can post the patch if you want for 17.01.1. .68 is expected any minute now though.

Kernels 4.5 down to 2.6 are vulnerable to remote code execution within the kernel as root. Maybe this has already been patched? The information about this vulnerability was just released last month.

I'm pretty sure it has. Thing is, the kernel devs don't explicitly refer to the CVEs in the changelogs, so you really need to track them down, you can't just grep the changelog for CVEs...

Just check the NIST entry and you'll see the Linux kernel was patched in January 2016 (!). On top of that, the page clearly states the vulnerability is in 4.4.60 and older kernels ;).

You are running the DIR-860L as well, right? How's the 4.4.67 kernel treating you? Are you using SQM by any chance? The current master branch and 17.01 branch both have issues with SQM on the mt7621 devices. It can cause stack traces and crashes that result in a reboot. Kernel 4.9 seems to have fixed it for me, but it is causing other issues for me. Was wondering whether kernel 4.4.67 is any good on mt7621 devices with SQM enabled.

For further discussion on the aforementioned issues, please see the end of this thread:
https://forum.openwrt.org/t/optimized-build-for-the-d-link-dir-860l/

And this bug report (please vote for it if you would like to developers to focus on this bug):

I am, yes. But there's no difference between the 17.01.x 4.4 versions and the .67 one. I'm not switching to trunk until 4.9 is stable enough on ramips.

I already voted for your bug report as well :slight_smile:

Just notice crash dump is enabled for ARM via https://github.com/lede-project/source/commit/48d71ab5021e5238623bab2f87b6425b2609c60a, can anyone give it a try?

From the look of that it's enabled by default? I just tested r4228-43e4e1f and there was no crash log created at /sys/kernel/debug/ (unless it's supposed to be somewhere else now?). My uptime was ~2hrs before lockup/reboot