New MT7628 AES crypto driver (beta status)

Okay, so I finally managed to rewrite the MT7628 AES Crypto driver. Please realise that I'm not a developer or a programmer. I'm a simple hobbyist trying to get things to work. With the knowledge of this project I'm confident that I will be able to code a completely new EIP-93 driver in the near future (for the MT7621).

I upgrade the status of this work to beta since I am getting good results. However there are still a few things to work out.

  1. The Makefile. I asked this question before but can't seem to figure it out. This driver needs the crypto-engine thats available in the kernel since v4.6 but I can't to "auto select" it and have it build with the kernel :frowning:

  2. For some reason unloading the driver doesn't call the "exit" function, I am probably staring too long at this code to see the mistake and should be an easy fix (I hope).

  3. I'm sure the real coders out there will spot a lot of mistakes or things to improve. Again, every little bit of help is welcome, but don't shoot this down: again I'm not a coder.

Source code at: https://github.com/vschagen/MT7628-AES

Some benchmark figures:


Software Only, no hardware crypto driver loaded:

root@OpenWrt:~# time -v openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 380375 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 193658 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 65250 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 17522 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 2219 aes-256-cbc's in 3.00s
OpenSSL 1.0.2n  7 Dec 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/drbrains/openwrt/build_dir/target-mipsel_24kc_musl/openssl-1.0.2n:openssl-1.0.2n -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/drbrains/openwrt/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc       2028.67k     4131.37k     5568.00k     5980.84k     6059.35k
	Command being timed: "openssl speed -elapsed -evp aes-256-cbc"
	User time (seconds): 0.76
	System time (seconds): 14.24
	Percent of CPU this job got: 94%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.84s
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 9856
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1
	Minor (reclaiming a frame) page faults: 110
	Voluntary context switches: 23
	Involuntary context switches: 220
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
root@OpenWrt:~# 

Hardware Only: Hardware crypto driver loaded (no bypass):

root@OpenWrt:~# time -v openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 81684 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 80746 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 78139 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 65406 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 29456 aes-256-cbc's in 3.00s
OpenSSL 1.0.2n  7 Dec 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/drbrains/openwrt/build_dir/target-mipsel_24kc_musl/openssl-1.0.2n:openssl-1.0.2n -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/drbrains/openwrt/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc        435.65k     1722.58k     6667.86k    22325.25k    80434.52k
	Command being timed: "openssl speed -elapsed -evp aes-256-cbc"
	User time (seconds): 0.28
	System time (seconds): 3.97
	Percent of CPU this job got: 26%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.81s
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 9856
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 110
	Voluntary context switches: 335454
	Involuntary context switches: 56
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
root@OpenWrt:~# 

Hybrid: Hardware driver loaded with 200 bytes bypass (fallback) to Software:

root@OpenWrt:~# time -v openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 363820 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 177904 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 77872 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 64473 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 29355 aes-256-cbc's in 3.00s
OpenSSL 1.0.2n  7 Dec 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/drbrains/openwrt/build_dir/target-mipsel_24kc_musl/openssl-1.0.2n:openssl-1.0.2n -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/drbrains/openwrt/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc       1940.37k     3795.29k     6645.08k    22006.78k    80158.72k
	Command being timed: "openssl speed -elapsed -evp aes-256-cbc"
	User time (seconds): 0.84
	System time (seconds): 7.92
	Percent of CPU this job got: 52%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 16.67s
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 10096
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 6
	Minor (reclaiming a frame) page faults: 108
	Voluntary context switches: 171958
	Involuntary context switches: 124
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
root@OpenWrt:~# 
2 Likes

In order for the driver to "Modprobe" properly I want to suggest to add the following to the mt7628an.dtsi

	crypto: crypto@10004000 {
		compatible = "ralink,mt7628an-aes", "mediatek,mtk-aes";
		reg = <0x10004000 0x1000>;

		interrupt-parent = <&intc>;
		interrupts = <13>;

		resets = <&rstctrl 29>;
		reset-names = "cryp";
		clocks = <&clkctrl 29>;
		clock-names = "cryp";
	};

All MT76x8 should have the aes-engine as part of the SoC. What do you think? Alternatively I could modify the module code to load regardless of the DTS, but this way seems more elegant.

currently some nitpicks from my side, because github is bad for a review

the macros
sysRegRead()
sysRegWrite()
are bad, use writel/readl

this will also remove the memory barriers rmb()wmb()

Also do a checkpatch codestyle test over all sourcefiles.
If your kernel source tree is in
~/linux
do
~/linux/scripts/checkpatch.pl -f $file

And rewrite your drivers Makefile to build an external kernel module, so all module dependencies will be resolved

Macros, I can do. I didn’t realize that would remove the need for the wmb(). Thanks.

Rewriting the makefile is one of my problems. I don’t understand how I can have the kernel depends properly selected.

I’m leaning towards doing the queue handling myself in code to get rid kernel depend crypto-engine anyway. I should be able to improve throughput and/or reduce the amounts of interrupts.

I’m rearranging the code in a more logical way now...

For readl()/writel() and/ir rmb()/wmb()

Documentation/memory-barriers.txt
Documentation/process/volatile-considered-harmful.rst

For external module look here in one of my repositories


there in the Makefile is the "trick"
make jumps into your kernel sourcetree, does some magic there and jumps back into your code..
So all dependencies are resolved magically

And if you want help please contact me via mail
ulli.kroll@googlemail.com
and read
Documentation/process/email-clients.rst
this is very important for a successful review

all the Documentation is in the linux source tree

Small update:

Point "2" on my to-do list is fixed. I seems like I was looking too long at the same code to spot the mistake. Module loads and unloads properly now.

Performance is improved: Latest benchmark (without bypassing the driver):

root@OpenWrt:~# time -v openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 99879 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 101346 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 98912 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 79248 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 32288 aes-256-cbc's in 3.00s
OpenSSL 1.0.2n  7 Dec 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/drbrains/openwrt/staging_dir/target-mipsel_24kc_musl/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/drbrains/openwrt/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/drbrains/openwrt/build_dir/target-mipsel_24kc_musl/openssl-1.0.2n:openssl-1.0.2n -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/drbrains/openwrt/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc        532.69k     2162.05k     8440.49k    27049.98k    88167.77k
	Command being timed: "openssl speed -elapsed -evp aes-256-cbc"
	User time (seconds): 0.61
	System time (seconds): 4.43
	Percent of CPU this job got: 31%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.82s
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 9664
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 108
	Voluntary context switches: 411696
	Involuntary context switches: 86
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
root@OpenWrt:~# 

Most important figures to compare: the 256 bytes and 1024 bytes block. 21% improvement and 23%.

Code checked and corrected according the suggestions by @ElektromAn.

1 Like