Netfilter "Flow offload" / HW NAT

@nbd trying hw offload on a edgerouter x I get this in a loop

I'm using pppoe on wan1 and static ip on wan2 with mwan3
the crashes are starting after I enable and disable flow_ofloading[_hw] in firewall

[  425.886392] ------------[ cut here ]------------
[  425.886402] WARNING: CPU: 2 PID: 0 at net/netfilter/nf_conntrack_rtcache.c:197 0x8f0463d8
[  425.886404] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CLASSIFY wireguard slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac
[  425.886531]  ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ip6_udp_tunnel udp_tunnel leds_gpio gpio_button_hotplug
[  425.886579] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W       4.14.34 #0
[  425.886581] Stack : 00000000 00000000 00000000 00000000 805b7ada 00000042 00000000 00000000
[  425.886599]         8fc44834 805508a7 804e1b0c 00000002 00000000 00000001 8fc11c40 2c45ddf5
[  425.886617]         00000000 00000000 805b0000 e3ddf769 00000004 80440fec b2a193a6 00089e70
[  425.886636]         00000000 80550000 0000c195 805e0000 00000000 00000000 80570000 8f0463d8
[  425.886654]         00000009 000000c5 00000001 00000003 00000002 80550000 00000008 805b0008
[  425.886672]         ...
[  425.886677] Call Trace:
[  425.886685] [<80010498>] show_stack+0x58/0x100
[  425.886693] [<8042a1dc>] dump_stack+0x9c/0xe0
[  425.886700] [<8002e1d8>] __warn+0xe0/0x114
[  425.886708] [<8002e29c>] warn_slowpath_null+0x1c/0x30
[  425.886720] [<8f0463d8>] 0x8f0463d8
[  425.886728] ---[ end trace 7c39a3f673569d93 ]---
[  425.886808] ------------[ cut here ]------------

OpenWrt SNAPSHOT, r6705-a18d88e863

EDIT:
I see that at line 194 there is dst_xfrm
I'll try your git commit kernel: avoid flow offload for connections with xfrm on the dst entry (should fix IPSec)

EDIT2: didn't paid attention the problem is actually at 197

EDIT3: mwan3 was using Local source interface br-lan that made the networking really unstable, after applying the patch and changing the Local source interface to none, for now, no kernel oops

regards

bug report here?
on x86,

[   82.344428] br-lan: port 1(eth0) entered disabled state
[   82.359577] device eth0 left promiscuous mode
[   82.405571] br-lan: port 1(eth0) entered disabled state
[   82.625689] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   83.055146] general protection fault: 0000 [#1] SMP PTI
[   83.064858] Modules linked in: ath9k ath9k_common rt2800usb rt2800lib rt2500usb pppoe ppp_async l2tp_ppp ath9k_hw ath6kl_usb ath6kl_core ath10k_pci ath10k_core ath rtl8187 rt73usb rt2x00usb rt2x00lib pptp pppox ppp_mppe ppp_generic nf_nat_pptp nf_conntrack_pptp mt7601u mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE ebtable_nat ebtable_filter ebtable_broute cfg80211 xt_u32 xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_policy xt_pkttype xt_physdev xt_owner xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_ipp2p xt_iface xt_hl xt_helper xt_hashlimit xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_addrtype xt_TRACE xt_TPROXY xt_TEE xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_IPMARK xt_HL xt_FLOWOFFLOAD xt_DSCP
[   83.351328]  xt_CT xt_CLASSIFY usblp ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda ts_fsm ts_bm slhc r8169 pcnet32 nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_dup_ipv6 nf_dup_ipv4 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda mmc_spi macvlan iptable_raw iptable_mangle iptable_filter ipt_rpfilter ipt_ah ipt_ECN ip6table_raw ip6t_rpfilter ip_tables ebtables ebt_vlan
[   83.549778]  ebt_stp ebt_snat ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_ip6 ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among ebt_802_3 e1000e crc7 crc_itu_t crc_ccitt compat_xtables compat br_netfilter bnx2 natcap fuse sch_cake act_connmark act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress evdev i2c_piix4 i2c_i801 i2c_smbus i2c_dev xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_NPT ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
[   83.970254]  nf_nat_ipv6 nf_nat nf_conntrack ip6t_rt ip6t_frag ip6t_hbh ip6t_eui64 ip6t_mh ip6t_ah ip6t_ipv6header ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables nfsv4 nfsv3 nfsd nfs msdos bonding ip6_gre ip_gre gre ixgbe igb i2c_algo_bit e1000 ifb l2tp_ip6 l2tp_ip l2tp_eth sit mdio l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipcomp6 xfrm6_tunnel xfrm6_mode_tunnel xfrm6_mode_transport xfrm6_mode_beet esp6 ah6 ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 ip6_tunnel tunnel6 tunnel4 ip_tunnel rpcsec_gss_krb5 auth_rpcgss oid_registry tun af_key xfrm_user xfrm_ipcomp xfrm_algo vfat fat lockd sunrpc grace isofs autofs4 dns_resolver nls_utf8 nls_iso8859_1 nls_cp437 eeprom_93cx6 sha256_ssse3 sha256_generic
[   84.099473]  sha1_ssse3 sha1_generic jitterentropy_rng drbg md5 hmac echainiv des_generic deflate zlib_deflate cts cbc authenc crypto_acompress uas sdhci_pltfm xhci_plat_hcd softdog sata_mv ehci_platform exfat tg3 ssb ptp pps_core mii libphy
[   84.150255] CPU: 0 PID: 13 Comm: kworker/u2:1 Not tainted 4.14.36 #0
[   84.169050] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   84.188081] Workqueue: events_power_efficient xt_flowoffload_hook_work [xt_FLOWOFFLOAD]
[   84.209326] task: ffff88000ecd0c80 task.stack: ffffc90000068000
[   84.224706] RIP: 0010:__nf_unregister_net_hook+0x1/0x90
[   84.242911] RSP: 0018:ffffc9000006be30 EFLAGS: 00010202
[   84.257405] RAX: 0000000000000000 RBX: ffff88000c5b3228 RCX: 0000000100170001
[   84.292175] RDX: ffff88000ecd0c80 RSI: ffff88000c5b3228 RDI: 6b6b6b6b6b6b6b6b
[   84.305095] RBP: ffffc9000006be58 R08: ffff88000c5b3578 R09: ffff88000c5b3538
[   84.325980] R10: ffffc9000006be50 R11: ffff88000fc1f310 R12: ffffffff81e6c580
[   84.396514] R13: ffff88000d1723d0 R14: ffff88000ec0fc00 R15: 0000000000000000
[   84.459500] FS:  0000000000000000(0000) GS:ffff88000fc00000(0000) knlGS:0000000000000000
[   84.525121] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.565460] CR2: 0000000000a931d8 CR3: 0000000001e08006 CR4: 00000000000606f0
[   84.638311] Call Trace:
[   84.655229]  ? nf_unregister_net_hook+0x88/0xd0
[   84.706898]  xt_flowoffload_hook_work+0x12a/0x17a [xt_FLOWOFFLOAD]
[   84.765504]  process_one_work+0x1c4/0x310
[   84.799558]  worker_thread+0x20b/0x3c0
[   84.850119]  kthread+0x112/0x120
[   84.884839]  ? process_one_work+0x310/0x310
[   84.923571]  ? kthread_create_on_node+0x40/0x40
[   84.966100]  ret_from_fork+0x35/0x40
[   84.981738] Code: 41 5c 41 5d 41 5e 41 5f 5d c3 48 8b 05 c1 f1 99 00 55 48 89 e5 48 85 c0 75 02 0f 0b e8 b9 f6 30 00 5d c3 0f 1f 80 00 00 00 00 55 <0f> b7 0f 48 89 e5 48 89 c8 48 c1 e0 04 48 8d 54 07 08 31 c0 eb 
[   85.100453] RIP: __nf_unregister_net_hook+0x1/0x90 RSP: ffffc9000006be30
[   85.111658] ---[ end trace 5c25a390045cac75 ]---
[   85.124535] Kernel panic - not syncing: Fatal exception
[   85.158899] Kernel Offset: disabled
[   85.164077] Rebooting in 3 seconds..

This happened on poweroff

I post a fixup patch for this issue

https://github.com/ptpt52/openwrt-openwrt/blob/master/target/linux/generic/hack-4.14/940-cleanup-offload-hooks-on-netdev-unregister.patch

2 Likes

You should probably mail your patch to the mailing list for better visibility or by creating a pull request. Or perhaps @nbd might be willing to have a look here :slight_smile: Thank you very much for the patch by the way!

Hi we have a problem with enabled flow offload and wireguard on mt7621. I hope you can solve this issue and not a bug in wireguard.

To reproduce the bug enable flow offload and then try to transfer data through wireguard, then thhe router will reboot instantly. i couldn't get log from this. Bu somebody did some debuging and shared a call trace here

One interesting thing I have noticed - I can add the iptables rule with:

iptables -I FORWARD 1 -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD
iptables-save

However, when I restart my device, the rule no longer appears and I have to re-add it. I am new to iptables so I may be doing something wrong, but has anyone else seen/solved this?

I could add it to my startup scripts, but I would think saving it should keep it across reboots.

Device: WRT3200ACM (mvebu cortex a9)
Snapshot: Around 4/26 (not near my router right now so I can't pull the exact commit/build), from trunk

It is working great though, CPU's around 10% when maxing out my connection at 130mbps down.

1 Like

You can add that to:

/etc/firewall.user

or easier to do as described here if on an image with that commit.

Since I am not too familiar with UCI, would it be something like the following based on the commit you linked?:

uci add firewall.defaults.flow_offloading='1'
uci commit firewall
/etc/init.d/firewall restart

Thanks for the help.

Almost right, use uci set firewall.@defaults[0].flow_offloading=1; uci commit firewall

2 Likes

That worked and I verified that after restarting it is still active. Thanks!

So I have been thinking. When benchmarking my Dir-860l by connecting a PC to the WAN port and LAN port and running an iperf3 test, I can see the following results:

-WAN <-> LAN: 940 mbit (same speed as 2 devices on the same switch, hence running into gigabit ethernet limitations)
-WAN <-> LAN with SQM: 650-700 mbit

However, with my real connection I am using a PPPoE connection on the WAN side instead of IPoE. This is giving me the following speed:

-WAN <-> LAN: 500 mbit down (my connection speed), 400 mbit up (100 mbit below my connection speed) and very low CPU idle %, showing I am CPU limited.
-WAN <-> LAN: 350-400 mbit in either direction with SQM enabled.

However, when enabling hw flow offload I am seeing:

-WAN <-> LAN: 500 mbit in either direction (99% CPU idle)
-WAN <-> LAN: N/A. Not possible to use SQM with hw flow offloading

Conclusions:

  1. My dir-860l is able to shape 650-700 mbit
  2. PPPoE is a severe bottleneck, unless hw flow offload is enabled
  3. But hw flow offload doesn't work with SQM

Would it be possible to:

  1. Apply hw flow offload on WAN <-> dummy interface. This makes sure the CPU intensive PPPoE is fully offloaded.
  2. Process dummy interface <-> LAN without hw flow offload, but keep software flow offload enabled. Should be fine to do this in software given my synthetic benchmarks and given that part 1) hardly costs any CPU cycles.
  3. Apply SQM on the dummy interface.

So basically, I have 3 questions:

  1. Would this be possible?
  2. If so, how do I apply hw flow offloading to some parts, but not all. And use software flow offloading for another part?
  3. How do I make the traffic follow: WAN <-> dummy interface <-> LAN.

Sorry for my ramblings :smiley:

1 Like

Are more people still seeing this issue? Unfortunately, I can only use Wireguard or hw flow offloading. Not both.

I opened a ticket on flyspray for that issue. If you have the ame vote for it so it will be reviewed faster.https://bugs.openwrt.org/index.php?do=details&task_id=1539

I am running into another bug with hw flow offload it seems. Could anyone please verify/deny whether they are seeing the same thing?:

Usually, when nothing is really happening on my home network, there is around ~100-200 active connections as can be seen in the Luci overview page. However, with hw flow offload enabled I started seeing thousands of active connections. Diving into the connections tab in Luci's Realtime Graphs, I could see hundreds of connections made by a computer that shutdown over 12 hours ago. For some reason, inactive connections are not timing out and left in the conntrack table. Disabling hw flow offload fixes the issue.

Sounds familiar for anyone else? @nbd is there any additional information that I can provide to help debugging the issue?

Edit: Running the 18.06 branch from a few days ago: OpenWrt 18.06-SNAPSHOT r6917-8948a78

I wrote a short addition to LuCI for controlling the offloading options in the firewall config page:

In case somebody wants to patch a live router, the file is:
/usr/lib/lua/luci/model/cbi/firewall/zones.lua

Screenshot:

image

4 Likes

Neat! Will this end up in the repository as well? :slight_smile:

As you can see from above, it currently a pull request in LuCI, as I want some feedback first, before committing it.

But it should end up into Luci master (and 18.06) if no negative feedback comes.

i'm seeing this on master 7044. 7500 connections open after a few hours.

same problem. Flow offload cann't work together with wireguard. My target is bcm53xx.

1 Like