geli performance still being limited by single core of CPU when Intel Quick Assist installed

Hi All,

Need some help and direction please.

I have a system with an Intel QAT 8955 installed. I am seeing about a 30% performance throughput increase when qat() and geli() are enabled, but no where near the throughput Intel have stated of 5 gigabits per second (671 megabytes per second).

I have put together a simple test script as follows:
sh:
#!/bin/sh
for a in HMAC/SHA1 HMAC/RIPEMD160 HMAC/SHA256 HMAC/SHA384 HMAC/SHA512; do

        for e in AES-XTS AES-CBC Camellia-CBC NULL; do
                for l in 128 256; do
                        echo Parameters: $a $e $l
                        dd if=/dev/random of=/root/nda0_test.key bs=128k count=1 > /dev/null 2>&1
                        gpart create -s GPT nda0 > /dev/null
                        gpart add -a 4096 -t freebsd-ufs -l nda0_test nda0 > /dev/null
                        geli init -P -a "$a" -e "$e" -l $l -s 4096 -K /root/nda0_test.key -B nda0_test.eli gpt/nda0_test > /dev/null
                        geli attach -p -k nda0_test.key gpt/nda0_test
                        dd if=/dev/zero of=/dev/gpt/nda0_test.eli bs=10m count=25 status=progress 2>&1 | grep sec | awk '{print "write: " $7 " " $8}'
                        dd if=/dev/gpt/nda0_test.eli of=/dev/null bs=10m count=25 status=progress 2>&1   | grep sec | awk '{print "read : " $7 " " $8}'
                        geli kill gpt/nda0_test.eli  > /dev/null
                        gpart destroy -F nda0 > /dev/null
                        echo
                done
        done
done

The results are:
Read:
AESNI onlyAES-XTS-128AES-XTS-256AES-CBC-128AES-CBC-256Camellia-CBC-128Camellia-CBC-256NULL-128NULL-256
HMAC/SHA130,702,90827,704,55232,433,53129,327,17728,142,54424,942,06554,497,50054,513,682
HMAC/RIPEMD16033,000,08029,583,74435,001,74231,410,99030,043,36026,430,37462,091,23462,108,164
HMAC/SHA25627,655,16925,204,26529,096,82526,582,63125,629,30222,968,06545,561,23245,550,977
HMAC/SHA38429,819,81126,967,03231,466,26128,541,28727,358,90724,397,31951,732,10751,713,366
HMAC/SHA51228,361,48325,713,69329,757,04027,130,71426,178,79123,392,58148,074,24848,076,202
AESNI + QAT(8955: sym;asym)AES-XTS-128AES-XTS-256AES-CBC-128AES-CBC-256Camellia-CBC-128Camellia-CBC-256NULL-128NULL-256
HMAC/SHA136,461,68436,244,03835,709,57035,257,38428,148,31824,955,08954,493,77654,488,583
HMAC/RIPEMD16032,979,28129,567,10235,002,86631,424,41030,029,67726,433,81862,093,09262,100,242
HMAC/SHA25633,882,86033,719,14633,798,35835,220,94025,637,07922,981,36345,571,15145,491,652
HMAC/SHA38430,826,52630,204,09030,421,09730,692,80627,361,17924,387,27151,740,49851,729,234
HMAC/SHA51228,060,22227,506,71427,839,86628,689,25826,179,39623,391,79448,084,27148,065,621

Write:
AESNIAES-XTS-128AES-XTS-256AES-CBC-128AES-CBC-256Camellia-CBC-128Camellia-CBC-256NULL-128NULL-256
HMAC/SHA130,418,35727,727,27831,907,91128,930,78327,632,75424,501,87854,319,13554,366,034
HMAC/RIPEMD16033,010,92929,561,26434,420,73030,961,11429,456,88325,903,13261,916,54361,859,619
HMAC/SHA25627,678,88525,216,33128,674,28426,247,51925,218,91322,613,49545,544,12545,497,415
HMAC/SHA38429,880,11727,090,28931,042,45228,240,15226,970,33923,982,17251,816,63251,842,341
HMAC/SHA51228,487,97925,838,58329,483,99326,918,14225,832,21823,065,06848,369,64648,301,360
AESNI + QAT(8955: sym;asym)AES-XTS-128AES-XTS-256AES-CBC-128AES-CBC-256Camellia-CBC-128Camellia-CBC-256NULL-128NULL-256
HMAC/SHA135,207,86835,656,57935,061,06527,653,26124,506,90724,506,90754,438,69254,385,295
HMAC/RIPEMD16033,029,92229,603,29834,425,76330,990,81429,482,25725,919,55462,052,64661,944,810
HMAC/SHA25633,253,54532,882,88933,244,96332,874,41125,236,30922,608,83845,569,66245,610,269
HMAC/SHA38429,828,80229,552,86329,834,28229,653,12026,997,69924,005,47752,001,64751,941,500
HMAC/SHA51227,210,91026,600,29627,166,87426,880,94725,847,09323,057,18948,394,10948,335,688

The maximum throughput I am seeing with the QAT card installed is 35.6 megabytes per second (or 0.284 gigabits per second) throughput, or only about 5% utilisation of the QAT card.

The underlying disk is a PCI NVME which tests without using geli() at 1,082,423,799 (1 gigabyte per second) and 1,279,609,700 (1.2 gigabytes per second) for write and read respectively. Even using no encryption (NULL-128 or NULL-256) is faster than with encryption using offload.

It does seem that the throughput is being limited by the CPU which is maxing out at 100% on a single core for geli() in the host machine (testing in an older Intel(R) Xeon(R) CPU E5-2403 0 @ 1.80GHz).

If it is offloading the crypto functions to the QAT, why is the CPU maxed out?

Any ideas or signposting please?
 
OK, running
openssl speed -elapsed -evp aes-256-cbc shows an alogithum speed of 253,370,200 bytes per second for 1k blocks, yet geli is only giving 32,874,411 on the same host.

Something is definately not right...

Help, please?
 
Back
Top