TP-Link TL-WR1043N/ND v1: Unreliable wifi, useless after ~24 hours (ANI disabled)

Hardware: TP-Link TL-WR1043ND v1 (old-style white plastic case)
Firmware: Latest stable -> LEDE Reboot 17.01.4 r3560-79f57e422d / LuCI lede-17.01 branch (git-17.290.79498-d3f0685)

I recently obtained this device from a friend and thought it would be a good excuse to use openwrt as my home gateway. Ethernet performance has so far been perfect (no drops, good pings) but wifi has been reliably unreliable.

The wireless symptoms I'm suffering appear to be the same as what the wiki describes as being related to ANI on old openwrt versions. Notably I see:

  • Frequent wifi disassocs
  • Lots of packet drops
  • Completely inability to connect to the AP after about a day

Disabling ANI seems to have zero effect on these symptoms, and I'm running an up-to-date copy of Openwrt:

root@LEDE:/sys/kernel/debug/ieee80211/phy0/ath9k# cat ani
            ANI: DISABLED
root@LEDE:/sys/kernel/debug/ieee80211/phy0/ath9k# uname -a
Linux LEDE 4.4.92 #0 Tue Oct 17 14:59:45 2017 mips GNU/Linux

Internal to the router all three antenna coax lines are properly soldered down and don't appear to be damaged.

From my laptop's point of view, running wpa_supplicant in it's highest debug mode (-dd), I'm seeing this message getting occasionally repeated when I'm connected:

nl80211: Event message available
nl80211: Ignored event (cmd=64) for foreign interface (ifindex 3 wdev 0x0)
nl80211: Drv Event 64 (NL80211_CMD_NOTIFY_CQM) received for wlp3s0

Here's a de-auth session, not sure if any information here is useful

nl80211: Event message available
nl80211: Ignored event (cmd=20) for foreign interface (ifindex 3 wdev 0x0)
nl80211: Drv Event 20 (NL80211_CMD_DEL_STATION) received for wlp3s0
nl80211: Delete station 90:f6:52:3e:ab:66
nl80211: Event message available
nl80211: Ignored event (cmd=39) for foreign interface (ifindex 3 wdev 0x0)
nl80211: Drv Event 39 (NL80211_CMD_DEAUTHENTICATE) received for wlp3s0
nl80211: MLME event 39 (NL80211_CMD_DEAUTHENTICATE) on wlp3s0(14:0c:76:d3:bf:69) A1=90:f6:52:3e:ab:66 A2=14:0c:76:d3:bf:69
nl80211: MLME event frame - hexdump(len=26): c0 00 00 00 90 f6 52 3e ab 66 14 0c 76 d3 bf 69 90 f6 52 3e ab 66 00 00 04 00
nl80211: Deauthenticate event
wlp3s0: Event DEAUTH (12) received
wlp3s0: Deauthentication notification
wlp3s0:  * reason 4 (locally generated)
wlp3s0:  * address 90:f6:52:3e:ab:66
Deauthentication frame IE(s) - hexdump(len=0): [NULL]
wlp3s0: CTRL-EVENT-DISCONNECTED bssid=90:f6:52:3e:ab:66 reason=4 locally_generated=1
wlp3s0: Auto connect enabled: try to reconnect (wps=0/0 wpa_state=9)
wlp3s0: Setting scan request: 0.100000 sec

Can anyone give me advice on how to track this problem down? Is there a way of getting verbose logs out of the wpa_supplicant session on the AP? Are there other known causes or fixes for this problem?

Can you post your '/etc/config/wireless' configuration on the router (hide sensitive information)? Also, have you considered adding a command to restart it periodically in the crontab? I use this on my TL-WR1043ND v3:

45 5 * * * wifi

It restarts my wifi everyday at 5:45 am. I noticed speeds would drop after a day or two on mine. Instead of having to reboot the router or shell into it constantly I added that under 'Scheduled Tasks.'

Also, you can try adding this line to your '/etc/config/wireless' on the router under the 'wifi-iface'

option disassoc_low_ack '0'

Adjusting the distance can help as well, try values between 20-25. Finally, disable legacy rates if they are not already disabled and you don't use older devices that would require them. Under options 'wifi-device' in '/etc/config/wireless':

option legacy_rates '0'

/etc/config/wireless

config wifi-device 'radio0'
	option type 'mac80211'
	option hwmode '11g'
	option path 'platform/ath9k'
	option country 'AU'
	option legacy_rates '1'
	option channel '1'
	option htmode 'HT20'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'xxxx'
	option key 'xxxx'
	option encryption 'psk2'
	option wmm '0'

If there's anything unusual in the above config then it's due to my attempts to fiddle with settings and fix the problem. This problem exists with a simple/normal WPA2 style AP.

I will try your suggestions, thankyou.

Try this for your /etc/config/wireless:

config wifi-device 'radio0'
	option type 'mac80211'
	option hwmode '11g'
	option path 'platform/ath9k'
	option country 'AU'
	**option legacy_rates '0'**
	option channel '1'
	option htmode 'HT20'
        **option txpower '24'**
        **option distance '25'**
        **option beacon_int '100'**
        **list ht_capab 'SHORT-GI-20'**
        **option disabled '0'**

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'xxxx'
	option key 'xxxx'
	**option encryption 'psk2+ccmp'**
	**option wmm '1'**
        **option wpa_disable_eapol_key_retries '1'**
	**option tdls_prohibit '1'**
        **option disassoc_low_ack '0'**

This makes absolutely no sense to me. I am using one of these exact routers as a wireless bridge endpoint and it works flawlessly. It is connected wirelessly to my main router and has a OSMC RPi, TV, and Stereo wired to it. I watch HD content on my RPi through this connection and never restart the router.
Maybe try resetting it and configuring it as simply as possible.

@mj5030 I've been running your settings for a few hours. A ping test now at close range (~72% "reception") is getting weird jitters already.

@simplexion I'll wait a few more hours, do some more testing and then give this a go. Glad to hear that this device is supposed to work so well.

Just to clarify the ** in the config were supposed to bold the changes on here that I made to your previous one and should not be in the config file....

As @simplexion stated, the TL-WR1043ND series are really stable and run well under LEDE typically. So if all else fails try a 30-30-30 reset or even a re-flashing of the firmware.

1 Like

I experience the exactly same problems, since i upgraded my router's firmware to LEDE Reboot 17.01.4 r3560-79f57e422d / LuCI lede-17.01 branch (git-17.290.79498-d3f0685).

I tried resetting and even re-flashing. Sometimes it works for days, sometimes it fails after only 20 mins.

There's one thing I notices, which I hope may be helpful:

The wifi problems only seem to occur, when the wireless is associated to the LAN interface. It did not fail at all (at least I did not notice), when I had it connected to the WAN interface.

The problem is still here for me too.

I went through full resets and reflashes. In the end I tried a different Wifi BSSID and psk, just in case their length or something could be causing obscure driver bugs. This "seemed" to fix everything for a period of a few weeks, so I was very happy.

Now the device has returned to the bad behaviour. Latencies and packet drops go through the roof, even ssh'ing in to restart the wifi takes most of a minute. I have to repeat this process anything from every day to every week, depending on luck.

I'm going to look into what this means, as I'm not too familiar with OpenWRT and LEDE terminology. This doesn't put it on the wrong side of the NAT, does it? Or is 'WAN' just another layer between LAN and the nat+tunnel out?

Depends on what the "wrong side of the NAT" is. I use my LEDE powered router behind a router from my ISP. So on the WAN side my LEDE router is just a DHCP client to the ISP router. On the Lan side, the LEDE router is the "master" of the local network.

Assigning the wireless to the WAN port therefore means I'm able to access the internet (and the intermediary network, which only consists of the Lede router), but not the local network. In this configuration it seems to be stable. Unfortunately, i can not access the LEDE router from the wifi.

Ah, you're double-natting. I'm using a second ADSL<->ether router in (half-)bridged mode, so I only have one NAT, and can't (safely, easily) do this.

I've had a PM suggesting I check out the capacitors in my unit. I might do one better and scope its power rails during operation. A bad power supply could indeed explain the intermittent nature of this issue.

Re capacitors: tried some probing, noise on all the voltage rails was negligible. Albiet I don't think the thing was properly booting on my test bench, as I couldn't see the wifi network from my laptop. Perhaps the boot process is blocking and I need to connect a fake WAN cable to get it to progress further.

I tried shotgunning some of the electrolytic capacitors anyway. The ones I pulled tested all good, within spec capacitance and ESR, I couldn't see anything wrong. Replacing them was a small nightmare, there's not enough thermal isolation between the pads and the ground plane so soldering requires a nuclear blast furnace.

As of this morning the device is even worse. Client association lasts minutes at best and I'm getting a link speed of a few kB/sec.

I really would be surprised, if that was an hardware issue. As I tried to explain, it depends on the configuration. With Wifi on the WAN side, it works seamlessly. I think, we should submit a bug report.

i think you already have a solution: remove 305-ath9k-limit-retries-for-powersave-response-frames.patch

source: https://bugs.openwrt.org/index.php?do=details&task_id=1180

I have

no clue where and how to "remove 305-ath9k-limit-retries-for-powersave-response-frames.patch"

Quite a bit of effort if you have never compiled your own kernel before : (

You get a copy of the kernel source + buildchain first. Then use git to yank out the commit where the patch was merged (reference b30e092de65ca7be7cb277f934016484137d924c according to the author), but I'm not sure of the exact command for this, and you might have to be a bit more choosy. Finally compile the kernel and try to move it on to one of your APs.

There's likely a guide on the wiki for everything but the git manipulation bit.

Meanwhile I'm going to have a look to see if there are any runtime configurables that might affect this feature. If your claim of WAN attached the wifi is true then perhaps that's somehow keeping the card out of sleep mode (too many broadcast packets?), so that might also be an interesting route.

Without trying to step on anyones toes I think you have to accept the fact that old hardware doesn't always work as good as you remember, that new software usually is more demanding than older versions and the fact that older hardware doesn't play all that nice.

The TL-WR1043ND v1 uses about 10y old chipset/radio hardware, build quality wasn't great to begin with and I'm pretty sure AR9103 is the first or possibly second gen 11n draft-n by Atheros. It doesn't perform very well if you compare it to later generations of 11n hardware that isn't draft-n even when it works. From my last experiences years ago I never saw performance go above ~40-50mbit (usually a bit lower) using wireless without any interfering networks around and it dropped off quickly and showed strange behavior once you added neighboring networks that would cause interference or even certain types of clients. It might be down to hardware revisions but that's at least my experience with the few I had around.

You need to consider that due to the age and availability there's little hardware still around and probably very few if any developers who still tests (and runs) such old hardware. It could also very well be that there's not much else one can do (ie hardware limitations) as far as driver optimization goes. Also the fact that it only has 32Mbyte of RAM isn't in your favor. https://openwrt.org/supported_devices/432_warning

In the end you might want to consider if it's worth your time even if you might never get it to work acceptable or bite the sour apple and shell out ~50$+ depending on your needs. I know people want to keep hardware as long as possible but sometimes you have to accept the fact hardware needs to be replaced.

As a sidenote, would you expect a first gen Intel i5 Core CPU to perform good today? It's about 10y old hardware and I can assure you that even with 8Gb of RAM and SSD it's really slow/limited. You can't even watch Twitch in 720p without stuttering, forget about 1080p even in a standalone player with hardware acceleration (well, as much hw acceleration as it actually supports). Sure it does run Windows 7 however Linux doesn't run any better or at least Ubuntu.

Sorry for the rant (depends on your pov) :wink:

2 Likes

All good dizzy :slight_smile:

If it's a hardware issue we're looking at then it's a complete dud. This morning this device can't hold a link for more than a few minutes and packet drops ~50%, so I have to reboot it again. The suggestions of others that it could be related to long-standing bugs is my remaining rope.

Software & dev perspective: you young whippersnappers complaining about routers that are more than a few years old :stuck_out_tongue:

More seriously: I'm under the impression that most modern router chipsets are heavily rooted in historical designs, as evidenced by the dominating MIPS lineages and related stories of CPU licensing. They have not needed to increase the processing power, power efficiency or system specs in the vast majority of units, only bring down manufacture costs.

Performance and memory: I'm not after any special features that need the RAM, just something that is "better than OEM", because every oem unit I have used has had some nasty issues. I'm talking devices that secretly and unexplainably don't like certain MAC addresses, like the one on my laptop's in-built wifi phy. Change the MAC and everything works fine. Re-use that MAC on a desktop's ethernet phy and now that interface has the same issues. Fully factory reset the router and scour its settings to no avail. All sorts of fun.

As a sidenote, would you expect a first gen Intel i5 Core CPU to perform good today?

Muahaha, you are talking to a person that lives off retro hardware >:D. Circa 2011 desktop processors stopped their massive gains in speed every release as physical limits to their design became much harder to overcome. A 2008 era processor is quite a bit prior to this, but it's still more than enough for the things you list.

Anyway, I get your point. A few months back I was given this device by a friend and I was thrilled with its performance when I put OpenWRT on it, compared to my existing units that are not supported by openwrt. I know it can perform well for long periods (~few weeks), this problem appears to be triggered by something.

That was certainly true between ~2005 and ~2015, where SOC internals and performance didn't change that much (mostly simple frequency scaling between 400- and 720 MHz, with minimal improvements in the CPU core IP), except for adding more current wlan cards on top. However since >50 MBit/s WAN connections and wlan standards rivaling 1 GBit/s ethernet have become common, the old mips cores can no longer scale and more performance is needed. Not even to start with users' expectations of doubling up the router as NAS, media server with indexing and even minor transcoding/ streaming requirements, VPN endpoint, these tasks suddenly need performance, probably even more so in the consumer market than the business market (where you can go a long way with hardware accelerated routing and IPsec hwcrypto). Despite the little amount of development for the mips archtitecture since it stopped being a viable high-performance RISC workstation target, it had the capability to scale up to dual-core and roundabout 1 GHz clockspeed, but ARMv7 and ARMv8 have overtaken mips in terms of processing power (thanks to the smartphone development) and finally managed to dabble into mips' old advantage of high-performance I/O connections (PCIe, SATA, etc.).

In terms of wlan stability and reliability @diizzy is totally spot on, there is a massive difference between first generation draft-n silicon and more contemporary wlan chipsets released after the 802.11n standard was finalized. Keep in mind that draft-n devices had already entered the mass market over a year before the standard had been released, while changes were still being made, and driver/ firmware changes only go this far. Some of these changes in the final 802.11n standard mean that hardware acceleration can't be used for all circumstances, requiring to do the encryption on the system SOC instead of the wlan hardware, this isn't much of a problem for x86 gear that has plenty of performance to spare, but this is a burden for 400 MHz mips SOCs. Another problem are actual hardware (silicon) bugs in these early chipsets that were rushed to market, they didn't quite get the time to mature in the lab, but in the user's hands instead - with fixes being applied to later silicon instead.

Don't disregard the 4/32 warning either, it's becoming a real problem for actual production use.

1 Like