Hello. It seems security/gnutls from ports and pkg are not making use of the AES acceleration features from the ARMv8 Cryptographic extensions. This makes Samba server encryption incredibly slow.
I've observed this on a Mac M1 FreeBSD 13-RELEASE virtual machine and also a RockPro64 running 13-RELEASE.
All the examples bellow are running the same software versions:
- FreeBSD 13-RELEASE
- security/gnutls 3.6.16
- security/nettle 3.7.3
The following output seems to suggest the module is baked into the kernel:
I can also see openssl from pkg being greatly benefited from this producing test results up to ~20x faster:
With base OpenSSL on RockPro64:
With OpenSSL from pkg on RockPro64:
Here are some comparisons:
RockPro64:
FreeBSD on M1:
For comparison, here is the same command running on an old APU2C2 machine from 2012 which should be slower than the RockPro64:
As for the M1, this is a Linux VM running on the same M1:
This is the same for base OpenSSL. However, if I install that package from ports or pkg I get ~25 times more faster operations. However there seems to exist no option to enable something similar on GNUTLS or Nettle. Considering how fast an older version performed on Linux on the same hardware it also doesn't look like its related to a version. In any case this basically makes the possibility of securing net/samba413 on ARM based SBC hardware a bit complicated.
Kindly appreciate any insights and discussion.
I've observed this on a Mac M1 FreeBSD 13-RELEASE virtual machine and also a RockPro64 running 13-RELEASE.
All the examples bellow are running the same software versions:
- FreeBSD 13-RELEASE
- security/gnutls 3.6.16
- security/nettle 3.7.3
The following output seems to suggest the module is baked into the kernel:
Code:
kldload armv8crypto
kldload: can't load armv8crypto: module already loaded or in kernel
I can also see openssl from pkg being greatly benefited from this producing test results up to ~20x faster:
With base OpenSSL on RockPro64:
Code:
for ALG in aes-128-ccm aes-128-gcm; do
openssl speed -evp ${ALG} -bytes 1500 2> /dev/null | grep "^${ALG}"
done
aes-128-gcm 15808.85k
aes-128-ccm 13117.70k
With OpenSSL from pkg on RockPro64:
Code:
for ALG in aes-128-ccm aes-128-gcm; do
/usr/local/bin/openssl speed -evp ${ALG} -bytes 1500 2> /dev/null | grep "^${ALG}"
done
aes-128-ccm 108811.86k
aes-128-gcm 246024.50k
Here are some comparisons:
RockPro64:
Code:
gnutls-cli --benchmark-tls-ciphers
Testing throughput in cipher/MAC combinations (payload: 1400 bytes)
AES-128-GCM - TLS1.2 6.67 MB/sec
AES-128-GCM - TLS1.3 7.91 MB/sec
AES-128-CCM - TLS1.2 6.12 MB/sec
AES-128-CCM - TLS1.3 5.77 MB/sec
CHACHA20-POLY1305 - TLS1.2 14.24 MB/sec
CHACHA20-POLY1305 - TLS1.3 14.29 MB/sec
AES-128-CBC - TLS1.0 7.76 MB/sec
CAMELLIA-128-CBC - TLS1.0 6.59 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 3.18 MB/sec
Testing throughput in cipher/MAC combinations (payload: 16384 bytes)
AES-128-GCM - TLS1.2 7.08 MB/sec
AES-128-GCM - TLS1.3 8.36 MB/sec
AES-128-CCM - TLS1.2 5.64 MB/sec
AES-128-CCM - TLS1.3 5.98 MB/sec
CHACHA20-POLY1305 - TLS1.2 15.79 MB/sec
CHACHA20-POLY1305 - TLS1.3 15.59 MB/sec
AES-128-CBC - TLS1.0 8.30 MB/sec
CAMELLIA-128-CBC - TLS1.0 6.95 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 3.28 MB/sec
FreeBSD on M1:
Code:
gnutls-cli --benchmark-tls-ciphers
Testing throughput in cipher/MAC combinations (payload: 1400 bytes)
AES-128-GCM - TLS1.2 87.56 MB/sec
AES-128-GCM - TLS1.3 87.37 MB/sec
AES-128-CCM - TLS1.2 75.69 MB/sec
AES-128-CCM - TLS1.3 75.71 MB/sec
CHACHA20-POLY1305 - TLS1.2 172.27 MB/sec
CHACHA20-POLY1305 - TLS1.3 171.13 MB/sec
AES-128-CBC - TLS1.0 96.21 MB/sec
CAMELLIA-128-CBC - TLS1.0 59.41 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 23.07 MB/sec
Testing throughput in cipher/MAC combinations (payload: 16384 bytes)
AES-128-GCM - TLS1.2 90.89 MB/sec
AES-128-GCM - TLS1.3 90.77 MB/sec
AES-128-CCM - TLS1.2 78.60 MB/sec
AES-128-CCM - TLS1.3 78.57 MB/sec
CHACHA20-POLY1305 - TLS1.2 186.75 MB/sec
CHACHA20-POLY1305 - TLS1.3 185.87 MB/sec
AES-128-CBC - TLS1.0 103.75 MB/sec
CAMELLIA-128-CBC - TLS1.0 62.15 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 23.45 MB/sec
For comparison, here is the same command running on an old APU2C2 machine from 2012 which should be slower than the RockPro64:
Code:
gnutls-cli --benchmark-tls-ciphers
Testing throughput in cipher/MAC combinations (payload: 1400 bytes)
AES-128-GCM - TLS1.2 74.29 MB/sec
AES-128-GCM - TLS1.3 67.16 MB/sec
AES-128-CCM - TLS1.2 29.43 MB/sec
AES-128-CCM - TLS1.3 28.55 MB/sec
CHACHA20-POLY1305 - TLS1.2 23.21 MB/sec
CHACHA20-POLY1305 - TLS1.3 21.89 MB/sec
AES-128-CBC - TLS1.0 20.25 MB/sec
CAMELLIA-128-CBC - TLS1.0 8.69 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 3.79 MB/sec
Testing throughput in cipher/MAC combinations (payload: 16384 bytes)
AES-128-GCM - TLS1.2 131.10 MB/sec
AES-128-GCM - TLS1.3 127.85 MB/sec
AES-128-CCM - TLS1.2 36.79 MB/sec
AES-128-CCM - TLS1.3 35.60 MB/sec
CHACHA20-POLY1305 - TLS1.2 27.30 MB/sec
CHACHA20-POLY1305 - TLS1.3 26.94 MB/sec
AES-128-CBC - TLS1.0 31.21 MB/sec
CAMELLIA-128-CBC - TLS1.0 11.26 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 4.01 MB/sec
As for the M1, this is a Linux VM running on the same M1:
Code:
gnutls-cli --benchmark-tls-ciphers
Testing throughput in cipher/MAC combinations (payload: 1400 bytes)
AES-128-GCM - TLS1.2 0.72 GB/sec
AES-128-GCM - TLS1.3 0.70 GB/sec
AES-128-CCM - TLS1.2 0.35 GB/sec
AES-128-CCM - TLS1.3 0.34 GB/sec
CHACHA20-POLY1305 - TLS1.2 178.56 MB/sec
CHACHA20-POLY1305 - TLS1.3 177.00 MB/sec
AES-128-CBC - TLS1.0 0.36 GB/sec
CAMELLIA-128-CBC - TLS1.0 63.72 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 23.59 MB/sec
Testing throughput in cipher/MAC combinations (payload: 16384 bytes)
AES-128-GCM - TLS1.2 0.87 GB/sec
AES-128-GCM - TLS1.3 0.87 GB/sec
AES-128-CCM - TLS1.2 0.38 GB/sec
AES-128-CCM - TLS1.3 0.37 GB/sec
CHACHA20-POLY1305 - TLS1.2 193.52 MB/sec
CHACHA20-POLY1305 - TLS1.3 192.91 MB/sec
AES-128-CBC - TLS1.0 0.56 GB/sec
CAMELLIA-128-CBC - TLS1.0 65.77 MB/sec
GOST28147-TC26Z-CNT - TLS1.2 23.98 MB/sec
[root@alarm ~]# gnutls-cli --version
gnutls-cli 3.6.12
This is the same for base OpenSSL. However, if I install that package from ports or pkg I get ~25 times more faster operations. However there seems to exist no option to enable something similar on GNUTLS or Nettle. Considering how fast an older version performed on Linux on the same hardware it also doesn't look like its related to a version. In any case this basically makes the possibility of securing net/samba413 on ARM based SBC hardware a bit complicated.
Kindly appreciate any insights and discussion.