Hardware crypto for Mediatek missing?

Does the commit that added the patch mention anything about why the HW-Engine was disabled?

Doing a simple benchmark on the router itself:
root@LEDE:~# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 954011 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 267999 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 69591 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 17557 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 1980 aes-256-cbc's in 3.01s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 5088.06k 5717.31k 5938.43k 5992.79k 5388.76k
root@LEDE:~# insmod cryptodev
root@LEDE:~# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 314592 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 64 size blocks: 326500 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 256 size blocks: 290500 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 1024 size blocks: 153734 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 8192 size blocks: 38931 aes-256-cbc's in 2.97s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1689.08k 7035.69k 25039.73k 53004.58k 107381.40k
root@LEDE:~#

Anyone knows a good "real-life" benchmark / test??

1 Like

Next problem with this driver. It's very old :slight_smile: which means it still uses IRQF_DISABLED. Since kernel 3 depreciated, since 4.1 removed. We are on 4.4 now (looking into 4.9)

This part I can replace, but I run into problems when I want to use interrupts. This should help to get CPU usage down, but when I enable interrupts it generates a lot of errors.

@maurer, I noticed you were doing this for the MT7621 on the mqmaker forum. It seemed like there the interrupt problem was solved, but I couldn't find how. It just said the "board" was now fully supported. Does that mean I need to look into the DTS(I) files for full interrupt support?

unfortunately the guy on mqmaker forum - stas2z didn't released his source code - only releases binaries and builder files. But there is a hope :slight_smile:
the guy that made the first backports releases his code:



that's about the best chance to have mt7621_hw_ipsec enabled in lede

1 Like

I'm looking at the Padavan code a lot, but even he didn't activate interrupts. The IRQF reference is still "allowed" in his kernel 3.x versions. Looking (comparing) with the Wive-NG project as well.

His way (he has to) is to modify some kernel code to intercept the IPSec packages. This was never "allowed" by the OpenWRT community. Considered a security issue. The same most likely the the engine disable patch: rumors had it, that the NSA had some backdoor in the hardware engines.

Me, I'm not that concerned about this part...don't think I qualify to spend resources on :wink: so Im just trying t get the most out of the hardware. Using IPSec and/or OpenVPN-OpenSSL to bypass Geo location problems or pass some other government firewall so I can access my favorite website.

Didn't find a good real life benchmark yet. The OpenSSL speed is not a realistic indication (looks good though)

Succes :smiley: !!

root@LEDE:~# rmmod mtk_aes
root@LEDE:~# time -v openssl speed -elapsed -evp aes-256-cbc -engine cryptodev
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 276675 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 64 size blocks: 153719 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 256 size blocks: 54902 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 1024 size blocks: 13558 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 8192 size blocks: 2002 aes-256-cbc's in 2.97s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1490.51k 3312.46k 4732.29k 4674.54k 5522.01k
Command being timed: "openssl speed -elapsed -evp aes-256-cbc -engine cryptodev"
User time (seconds): 0.49
System time (seconds): 14.02
Percent of CPU this job got: 90%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.96s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 10400
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 111
Voluntary context switches: 83
Involuntary context switches: 454
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
root@LEDE:~# modprobe mtk_aes b=16
root@LEDE:~# time -v openssl speed -elapsed -evp aes-256-cbc -engine cryptodev
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 116585 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 64 size blocks: 116865 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 256 size blocks: 104425 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 1024 size blocks: 95388 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 8192 size blocks: 34593 aes-256-cbc's in 2.97s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 628.07k 2518.30k 9000.94k 32887.98k 95416.11k
Command being timed: "openssl speed -elapsed -evp aes-256-cbc -engine cryptodev"
User time (seconds): 1.26
System time (seconds): 8.41
Percent of CPU this job got: 60%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.92s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 10400
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 112
Voluntary context switches: 466871
Involuntary context switches: 284
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
root@LEDE:~# ^C
root@LEDE:~# cat /proc/interrupts
CPU0
4: 1232369 MIPS 4 mt76x2e
5: 627685 MIPS 5 10100000.ethernet
6: 21 MIPS 6 mt7603e
7: 144167 MIPS 7 timer
21: 3936726 INTC 13 aes_engine
25: 9 INTC 17 esw
28: 12 INTC 20 serial
40: 0 GPIO 38 gpio-keys
ERR: 151694
root@LEDE:~#

2 Likes

so you got mtk_aes working on mt7628 - good job !
how will you profit from it?

Profit :wink: ... haha. For now I'm happy with the result in porting / patching a driver from an older kernel / different project to Lede. Now trying to apply what I learned and port the proprietary drivers for Wifi and Ethernet (with hopefully HW-NAT). Priority on the Wifi since the open source MT76 still gives a lot of problems. Hopefully that will be sorted quickly, bug until then we can have options to use "less" open drivers.

As for the AES, hoping to benefit from it using OpenVPN.

Just wanted to say that that's amazing :slight_smile:
Getting hwnat and the proprietary WiFi drivers up and running would be even more amazing!
Can't wait to test some of your stuff.

Very good job, @drbrains ! You're doing amazing work :slight_smile: . Getting HWnat and the proprietary wifi drivers would be amazing. The open source mt76 drivers in their current state is a bit of a let down unfortunately :frowning:

It is possible to enable this by default on stable LEDE builds? I have some Marvell 88F6192 (from PopoPlug Mobile) and I am interested on using for OpenVPN endpoints too. Do you know if there similar steps to get it working on this SoC? It would be great to put a step-by-step instruction somewhere and keep it for reference. Thanks.

I did a quick google on tha chip. It should have some "security engine", but I couldn't find any driver / source code. Granted, since I don't have any Marvell based devices I didn't try for a long time. There should be some Linux driver for this engine, but to port it to the Lede SDK is not a simple to describe process. But you need done source code first.

Besides: there is a patch to disable al HW Engines and I haven't figured out yet, why someone decided like that?? For myself, I'm not using it in a high value environment, so for private use only at the moment until I understand why the HW was disabled. I don't want to open security holes even I found someone else doing it (remove the patch) dating back to 2013!

@braian87b Here http://forum.doozan.com/read.php?2,26394,26504#msg-26504 is some hints about Marvell's cryptodev engine.
@drbrains Can you share your code? There is also broadcom SoCs with crypto hw I want to play with.

Sure. Keep in mind that I still have to clean it up. For now I just made dirty edits to e.g. the included header files. Not a big problem, since this engine ONLY works with the MT7628 Crypto engine in the SOC.

Don't forget to enable hardware support in the OpenSSL lib AND remove the 150-no-engine patch from package/libs/OpenSSL/patches

I still didn't figure out why this is disabled via a patch, other then most system don't have it, so why enable it.

2 Likes

Strange numbers.

The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1490.51k 3312.46k 4732.29k 4674.54k 5522.01k

without HWE

The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 628.07k 2518.30k 9000.94k 32887.98k 95416.11k

with HWE.

on small blocks HW engine 2x times slow than CPU.

This is normal. For small blocks it takes more time to move the data into and out of the crypto module. Performance gain is with approximate block size of 1024 bytes and up. Why I showed the test with the "time -v" in front of the OpenSSL Speedtest was to show the amount of CPU bandwidth gained. 90% without vs 60% usage with the HW engine. Even if the speed would be the same, the "free" cpu cycles would be the benefit. With us doing more and more tasks on a router and getting faster and faster internet speeds, every cycle starts to count on limited devices.

As for real life, I am not sure how much performance is really gained using OpenVPN. I didn't get around to do proper testing. But even if it's just the 10-15% improvement as mentioned above and we win CPU cycles in the process, then why not. This resource is available on the SOC, so at least I wanted to have the option to use it.

1 Like

any update on the MT7621 crypto engine, without it luks encrypted usb3 device is limited to around 12-13mb/sec

1 Like

mt7621 crypto engine works with ipsec only !

Well...I think I will be able to have an “Alpha” release soon. (Depend how busy my normal work will be). The MT7621 should have a full implementation of the EIP-93. The chip is reporting its capabilities so some versions might be more crippled then others.

The IPSec acceleration makes the most sense cause it’s all kernel side. Not so sure how much improvement it will give. For LUKS the blocks might be big enough to make a significant difference. I don’t have experience with LUKS.

1 Like

Any improvement is needed. I run mt7621 with a NAS device (6 SATA ports) and performance is terrible with anything crypto related.