I confirm. Both offload and hw_offload needs to be enabled. I now reach 910-920Mbits DL, 213Mbits UL on a PPPoE connection, with almost no CPU usage at all.
ndb did an extremely good job here.
So the commands that makes this work are:
uci set firewall.@defaults[0].flow_offloading=1
uci set firewall.@defaults[0].flow_offloading_hw=1
uci commit
/etc/init.d/firewall restart