(Solved) Flaky wifi with WRT1900ACS v2 and Android phones

I have a wrt1900acs v2 running the latest wifi drivers packaged by @eduperez and it is giving me nothing but problems.

Specifically my Android phones simply refuse to stay connected. Recently it's gotten to the point where it hardly will even connect at all and I've turned it off so that my older tp-link devices are the only APs. We've had hot weather, and I wonder if that's relevant because it was never this bad before.

Rebooting the WRT device seems to fix the problem for a few hours.

When this occurs it seems like the logs output stuff about the android device timing out. It can be immediately less than a second after it connects

Does anyone else experience similar issues and have a fix? I've seen several threads on this kind of thing but no one has posted a fix in the past.

I'm at the point where I'm going to jump ship to a tp-link EAP 225v3 in short order. This WRT device has been super flaky.

I would start by discarding overheating issues: you can use "collectd" to monitor CPU temperatures and devices connected, and perhaps find a correlation between both values.

What is the exact error message (with a little context)?

1 Like

Yeah, I turned it off and have had no problems with wifi since last night... but it means I lost the logs. I'll turn it on again and hope the problem resumes, temps are lower today though, so if it's temp related... sigh :wink:

I don't have a WRT1900ACS v2 but I do have two WRT3200ACMs which work great with Android phones, all are QCA based ones and I'm only using 5Ghz. All are running firmwares off the Master branch
FWIW...

Mon Jul 16 18:47:40 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Mon Jul 16 18:47:40 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Mon Jul 16 18:47:40 2018 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-STARTED 38:80:df:11:8e:97
Mon Jul 16 18:47:40 2018 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Mon Jul 16 18:47:40 2018 daemon.info hostapd: wlan0: RADIUS Authentication server fd56:xxxx:xxxx:xxxx::a:1812
Mon Jul 16 18:47:40 2018 daemon.notice hostapd: connect[radius]: Host is unreachable
Mon Jul 16 18:47:40 2018 daemon.info hostapd: wlan0: RADIUS No authentication server configured
Mon Jul 16 18:47:51 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Mon Jul 16 18:47:51 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Mon Jul 16 18:47:51 2018 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-STARTED 38:80:df:11:8e:97
Mon Jul 16 18:47:51 2018 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Mon Jul 16 18:47:51 2018 daemon.info hostapd: wlan0: RADIUS Authentication server fd56:xxxx:xxxx:xxxx::a:1812
Mon Jul 16 18:47:51 2018 daemon.notice hostapd: connect[radius]: Host is unreachable
Mon Jul 16 18:47:51 2018 daemon.info hostapd: wlan0: RADIUS No authentication server configured
Mon Jul 16 18:48:06 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: disassociated
Mon Jul 16 18:48:07 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

So, that's interesting, it is complaining it can't talk to the RADIUS server, it's also not possible to ping the device on ipv6...

The device has a static ipv6 but I didn't specify the netmask, by adding /64 in LUCI it is now reachable.

The weird thing is that this did mostly work in the past.... so there is something weird and inconsistent with ipv6 in my config. Now that I've specified the netmask, let me see what happens.

EDIT: In the past, I was getting authenticated fine, connected fine, and then seconds or minutes or even tens of minutes later, it'd drop out with that "deauthenticated due to inactivity" type message. And, there was no message about the RADIUS server. So perhaps some attempt I made yesterday to make things work better broke something. In any case I'll let it ride for a day and see if I have problems and come back and update this topic.

Ok, now that the ipv6 stuff is working fine... here's logs of an example of what I'm seeing, repeated disconnects and reconnects from an android phone. This MAC is my Motorola G5 Plus:

Tue Jul 17 14:05:50 2018 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 38:80:df:11:8e:97
Tue Jul 17 14:05:54 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Tue Jul 17 14:05:54 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Tue Jul 17 14:05:54 2018 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-STARTED 38:80:df:11:8e:97
Tue Jul 17 14:05:54 2018 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Tue Jul 17 14:05:54 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 WPA: pairwise key handshake completed (RSN)
Tue Jul 17 14:05:54 2018 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 38:80:df:11:8e:97
Tue Jul 17 14:05:54 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 RADIUS: starting accounting session C8535E78A3B6D9AE
Tue Jul 17 14:05:54 2018 daemon.info hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.1X: authenticated - EAP type: 52 (unknown)
Tue Jul 17 14:06:30 2018 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 38:80:df:11:8e:97
Tue Jul 17 14:07:17 2018 daemon.info hostapd: wlan1-1: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Tue Jul 17 14:07:17 2018 daemon.info hostapd: wlan1-1: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Tue Jul 17 14:07:17 2018 daemon.notice hostapd: wlan1-1: AP-STA-CONNECTED 38:80:df:11:8e:97
Tue Jul 17 14:07:17 2018 daemon.info hostapd: wlan1-1: STA 38:80:df:11:8e:97 RADIUS: starting accounting session FFB8151A5952CE2A
Tue Jul 17 14:07:43 2018 daemon.notice hostapd: wlan1-1: AP-STA-DISCONNECTED 38:80:df:11:8e:97
Tue Jul 17 14:07:43 2018 daemon.info hostapd: wlan1: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Tue Jul 17 14:07:43 2018 daemon.info hostapd: wlan1: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Tue Jul 17 14:07:43 2018 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-STARTED 38:80:df:11:8e:97
Tue Jul 17 14:07:43 2018 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Tue Jul 17 14:07:43 2018 daemon.info hostapd: wlan1: RADIUS Authentication server fd56:9122:70e4:1::a:1812
Tue Jul 17 14:07:43 2018 daemon.info hostapd: wlan1: STA 38:80:df:11:8e:97 WPA: pairwise key handshake completed (RSN)
Tue Jul 17 14:07:43 2018 daemon.notice hostapd: wlan1: AP-STA-CONNECTED 38:80:df:11:8e:97

wlan0 and wlan0-1 are my private "LAN" 5ghz (one is a 5ghz only ssid) wlan1 is my 2.4ghz lan, and wlan1-1 is my guest network.

Basically what this shows is that my device is bouncing back and forth between all the different SSIDs and this continues through time. Furthermore I'm finding that I don't consistently get DHCP or consistently get SLAAC addresses. Sometimes it will connect ipv6 only, sometimes ipv4 only, sometimes both...

It seems like eventually I do get both ipv4 and ipv6 but maybe the initial NDP requests are not going through, and sometimes it has to wait for the periodic RA, other times maybe it gets the ipv6 SLACC RA but not the dhcp replies...

I'm now wondering if there isn't flakiness due to say EEE settings, where links are going down and then not coming back up in time on my switches. I"ll turn those off for my main trunks.

Further info:

tcpdump seems to indicate that icmp6 messages were not passing the bridge consistently, so it wasn't consistently hearing ipv6 neighbor messages or router advertisements (sometimes it did, sometimes it didn't). Since this is absolutely necessary in my network... that probably meant it couldn't resolve DNS etc etc.

I just disabled igmp_snooping on the bridge, and now icmp6 broadcast messages seem to be coming through the bridge properly.

Is there some known bug with icmp6 and linux bridges and igmp snooping that combines with the marvell wifi driver or something?

Note that I have

	option igmp_snooping '1'
	option multicast_to_unicast '1'

on the two other tp-link wdr3600 devices that make up the other 2 APs and was having no problems at all with ipv6 neighbor discovery etc.

The reason for the igmp snooping is to try to make games that use UDP multicast work better in the network. My kids like minecraft and/or rocket league etc.

That thread suggests that this was an issue previously but was fixed, is there some additional config required for the fix? Was a new related bug created recently, or anything like that?

Ok, I've confirmed that if I have

option igmp_snooping '1'

On this device that its ipv6 connection is flaky. Whereas I have that option on my TP-Link WDR3600 and it works fine.

Perhaps there's something wrong with some kind of hardware snooping in the WRT1900ACS v2 whereas the WDR3600 either uses the CPU or doesn't have this problem? Anyone else can confirm that ipv6 doesn't work well when igmp_snooping is enabled on this device?

Further weirdness debugged.

Under certain circumstances, not consistently, the WRT1900acs would use a link-local ipv6 address to talk to the RADIUS server, and the RADIUS server didn't have a link local address for allowed clients... so it wouldn't work. By adding a client definition to freeradius clients.conf I was able to avoid this problem:


client lanslink {
       ipv6addr = fe80::%bond0.1/16
...
}

so now at least whenever it decides it should use a link local source address instead of the ULA I assigned it... at least it won't be rejected by the radius server.

There are so many ways things can go wrong aren't there?

EDIT: obviously if this were consistent, I'd probably have figured it out pretty quickly, but it must depend on something that isn't consistent so it sometimes uses one source address sometimes another.

Yesterday I thought I'd actually just squashed the bugs and things were working... They did in fact work the second half of the day.

This morning, I wake up, look at my phone, and it's not connected, and furthermore, it won't connect. It will connect fine to the guest wifi running on the same device (2.4ghz second SSID). Due to the WRT having a fixed power output, it's always the AP my devices connect to unless I'm sitting next to one of the others, or in the driveway etc. So just repeated trying to connect, and repeated fail. I went and turned off the WRT and immediately connected to one of the other APs.

My impression is there's a time-dependent problem, and that it has something to do with the handshake for my WPA enterprise, which may again be related to broadcast/multicast. I say this because the guest network works fine.

So far I haven't been able to observe it not working, and collect a packet capture or anything like that. next time it happens I'll try, but often it's later in the day, or first thing in the morning and I have kids to take care of or family stuff to do.

The setup here is this:

x86 Router -> zyxel GS1900 switch -> TP-link sg108e -> WRT1900acs AP 1
                                  -> TP-link sg108e -> WDR AP 2
                                  -> Powerline Adapter -> WDR AP 3

The second and third APs are TP-Link WDR3600's

As soon as I turn off the WRT AP my phone connects to the AP2 that has the smart switch. so I don't think it's the zyxel or the tp-link smart switches screwing up the negotiation.

In general I think I can rule out temperature, because the overnight temps were low (mid 70's F).

The device does reboot every morning at 5:45 am, so that's a possibility that it screws up the switch on a reboot or something and a power-off recovers it. I'll try disabling the reboot and see if it survives the night.

I enabled remote logging so I can see the logs even if the router reboots. Here's logs immediately after forcing a reboot.

Jul 20 10:31:10 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Jul 20 10:31:10 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Jul 20 10:31:10 homerouter hostapd: wlan0: CTRL-EVENT-EAP-STARTED 38:80:df:11:8e:97
Jul 20 10:31:10 homerouter hostapd: wlan0: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:31:10 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 WPA: pairwise key handshake completed (RSN)
Jul 20 10:31:10 homerouter hostapd: wlan0: AP-STA-CONNECTED 38:80:df:11:8e:97
Jul 20 10:31:10 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 RADIUS: starting accounting session 96F846D59B5B4D54
Jul 20 10:31:10 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.1X: authenticated - EAP type: 52 (unknown)
Jul 20 10:31:15 homerouter hostapd: wlan1: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Jul 20 10:31:15 homerouter hostapd: wlan1: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Jul 20 10:31:15 homerouter hostapd: wlan1: CTRL-EVENT-EAP-STARTED 38:80:df:11:8e:97
Jul 20 10:31:15 homerouter hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:31:30 homerouter hostapd: wlan1: STA 38:80:df:11:8e:97 IEEE 802.11: disassociated
Jul 20 10:31:31 homerouter hostapd: wlan1: STA 38:80:df:11:8e:97 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Jul 20 10:31:37 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: authenticated
Jul 20 10:31:37 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.11: associated (aid 1)
Jul 20 10:31:37 homerouter hostapd: wlan0: CTRL-EVENT-EAP-STARTED 38:80:df:11:8e:97
Jul 20 10:31:37 homerouter hostapd: wlan0: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:31:37 homerouter hostapd: wlan0: AP-STA-CONNECTED 38:80:df:11:8e:97
Jul 20 10:31:37 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 RADIUS: starting accounting session 96F846D59B5B4D54
Jul 20 10:31:37 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 IEEE 802.1X: authenticated - EAP type: 52 (unknown)
Jul 20 10:31:38 homerouter hostapd: wlan0: STA 38:80:df:11:8e:97 WPA: pairwise key handshake completed (RSN)
Jul 20 10:32:55 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 IEEE 802.11: authenticated
Jul 20 10:32:55 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 IEEE 802.11: associated (aid 1)
Jul 20 10:32:55 homerouter hostapd: wlan1: CTRL-EVENT-EAP-STARTED f0:27:2d:a6:e2:84
Jul 20 10:32:55 homerouter hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:33:25 homerouter hostapd: wlan1: CTRL-EVENT-EAP-STARTED f0:27:2d:a6:e2:84
Jul 20 10:33:25 homerouter hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:33:41 homerouter hostapd: wlan1: RADIUS No response from Authentication server fd56:9122:70e4:1::xxxx:xxxx:1812 - failover
Jul 20 10:33:41 homerouter hostapd: wlan1: RADIUS Authentication server fd56:9122:70e4:1::a:1812
Jul 20 10:33:55 homerouter hostapd: wlan1: CTRL-EVENT-EAP-STARTED f0:27:2d:a6:e2:84
Jul 20 10:33:55 homerouter hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:33:55 homerouter hostapd: wlan1: CTRL-EVENT-EAP-STARTED f0:27:2d:a6:e2:84
Jul 20 10:33:55 homerouter hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:34:00 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 IEEE 802.11: deauthenticated due to local deauth request
Jul 20 10:35:00 homerouter crond[1198]: USER root pid 2102 cmd /sbin/fan_ctrl.sh
Jul 20 10:39:13 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 IEEE 802.11: authenticated
Jul 20 10:39:13 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 IEEE 802.11: associated (aid 1)
Jul 20 10:39:13 homerouter hostapd: wlan1: CTRL-EVENT-EAP-STARTED f0:27:2d:a6:e2:84
Jul 20 10:39:13 homerouter hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Jul 20 10:39:13 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 WPA: pairwise key handshake completed (RSN)
Jul 20 10:39:13 homerouter hostapd: wlan1: AP-STA-CONNECTED f0:27:2d:a6:e2:84
Jul 20 10:39:13 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 RADIUS: starting accounting session 5999D7380F0BAD32
Jul 20 10:39:13 homerouter hostapd: wlan1: STA f0:27:2d:a6:e2:84 IEEE 802.1X: authenticated - EAP type: 52 (unknown)

This 38:80:df:11:8e:97 MAC belongs to my Android phone.

As you can see, it authenticates, and then 5 seconds later it authenticates again even though it succeeded the first time, then 15 seconds after that it dissassociates due to inactivity (it almost seems like it gets 2 entries for this device and after the second authentication the other one has no activity so it times out and disassociates both of them) then another station tries to authenticate and has trouble talking ipv6 to the RADIUS server, then that station gives up and tries again, and succeeds

After this I turned wifi off and on on my android phone, and it authenticated fine.

When I'm having trouble, the symptom is that the message about "deauthenticated due to inactivity (timer DEAUTH/REMOVE)" occurs over and over again (I think, that's my memory I'll update here if I can reproduce).

I've also used cron to log temperatures every 5 mins to the remote log, so that'll help with the temp idea.

That's actually how stock wifi drivers sometimes behaved. Just to double-check, what's the output of opkg list-installed kmod-mwlwifi?

BTW, did you run collectd, like @eduperez suggested to rule out overheating issue? What are the 5-minute interval readings?

Temperature max is around 55C for the CPU and 82C for the wifi chipset during the middle of a hot day. And during that day I didn't have any problems...

Now that I've spent some time looking at logs and trying to fix any bugs that might be there, and turned off igmp_snooping, and etc I've been having no problems. The things that seem to have done it are:

  1. Turn off igmp_snooping
  2. enable link-local ipv6 source addresses for clients to the RADIUS server
  3. I turned off the sysctl enabling bridge iptables (there was a forward rule to allow bridge forwarding previously, but now the firewall doesn't even get called)
  4. A few other simplifications of configuration I can't remember.
  5. Stop it from rebooting each night

So far I've had none of the flaky wifi issues the last few days. I think I'll continue to monitor it for the upcoming week, where we expect a heat wave for several days, and then if I have no further problems, I'll mark this as solved and give a summary.

Well, so far I've had none of the continued issues. I'm not sure which of the above changes "fixed" things, but I do think that turning off igmp_snooping somehow helped with at least one of the possible problems.

For now, I'm going to mark this as solved and move on.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.