Solved GELI performance between AESNI and QAT

Hi All,

Would it be expected that AESNI will out perform an Intel Quick Assist card (8950); I'm seeing poor throughput with AESNI not loaded but QAT is running OK?

Could it be the default zfs() encryption (AES-XTS) is supported by AESNI, and not the Intel QAT 8950?

geli list is showing Crypto: hardware with just the qat() kernel module loaded.

Any ideas?
 
Code:
# sysctl -a | grep qat
qat0: <Intel dh895xcc QuickAssist> mem 0xd5080000-0xd50fffff,0xdc840000-0xdc87ffff,0xdc880000-0xdc8bffff irq 48 at device 0.0 numa-domain 0 on pci8
qat0: qat_dev0 started 12 acceleration engines
qat0: FW version: 4.18.0
qat_ocf0: <QAT engine>
        value:  /boot/kernel/qat.ko
        value:  /boot/kernel/qat_api.ko
        value:  /boot/kernel/qat_common.ko
        value:  /boot/kernel/qat_hw.ko
        value:  /boot/kernel/qat_dh895xcc_fw.ko
irq85: qat0:b0:157 @cpu0(domain0): 0
irq86: qat0:b1:159 @cpu0(domain0): 1238176
irq87: qat0:b2:161 @cpu0(domain0): 35
irq88: qat0:b3:163 @cpu0(domain0): 0
irq89: qat0:b4:165 @cpu0(domain0): 0
irq90: qat0:b5:167 @cpu0(domain0): 0
irq91: qat0:b6:169 @cpu0(domain0): 0
irq92: qat0:b7:171 @cpu0(domain0): 0
irq93: qat0:b8:173 @cpu0(domain0): 0
irq94: qat0:b9:175 @cpu0(domain0): 0
irq95: qat0:b10:177 @cpu0(domain0): 0
irq96: qat0:b11:179 @cpu0(domain0): 0
irq97: qat0:b12:181 @cpu0(domain0): 0
irq98: qat0:b13:183 @cpu0(domain0): 0
irq99: qat0:b14:185 @cpu0(domain0): 0
irq100: qat0:b15:187 @cpu0(domain0): 0
irq101: qat0:b16:189 @cpu0(domain0): 0
irq102: qat0:b17:191 @cpu0(domain0): 0
irq103: qat0:b18:193 @cpu0(domain0): 0
irq104: qat0:b19:195 @cpu0(domain0): 0
irq105: qat0:b20:197 @cpu0(domain0): 0
irq106: qat0:b21:199 @cpu0(domain0): 0
irq107: qat0:b22:201 @cpu0(domain0): 0
irq108: qat0:b23:203 @cpu0(domain0): 0
irq109: qat0:b24:205 @cpu0(domain0): 0
irq110: qat0:b25:207 @cpu0(domain0): 0
irq111: qat0:b26:209 @cpu0(domain0): 0
irq112: qat0:b27:211 @cpu0(domain0): 0
irq113: qat0:b28:213 @cpu0(domain0): 0
irq114: qat0:b29:215 @cpu0(domain0): 0
irq115: qat0:b30:217 @cpu0(domain0): 0
irq116: qat0:b31:219 @cpu0(domain0): 0
irq117: qat0:ae:221 @cpu0(domain0): 0
dev.qat_ocf.0.enable: 1
dev.qat_ocf.0.%parent: nexus0
dev.qat_ocf.0.%pnpinfo:
dev.qat_ocf.0.%location:
dev.qat_ocf.0.%driver: qat_ocf
dev.qat_ocf.0.%desc: QAT engine
dev.qat_ocf.%parent:
dev.qat.0.cnv_error:
dev.qat.0.fw_counters:
dev.qat.0.mmp_version: 0.0.1
dev.qat.0.hw_version: 0
dev.qat.0.fw_version: 4.18.0
dev.qat.0.heartbeat: 1
dev.qat.0.heartbeat_failed: 0
dev.qat.0.heartbeat_sent: 4
dev.qat.0.dev_cfg: [GENERAL]
dev.qat.0.num_user_processes: 0
dev.qat.0.cfg_mode: ks
dev.qat.0.cfg_services: sym;dc
dev.qat.0.state: up
dev.qat.0.%domain: 0
dev.qat.0.%parent: pci8
dev.qat.0.%pnpinfo: vendor=0x8086 device=0x0435 subvendor=0x8086 subdevice=0x0000 class=0x0b4000
dev.qat.0.%location: slot=0 function=0 dbsf=pci0:15:0:0
dev.qat.0.%driver: qat
dev.qat.0.%desc: Intel dh895xcc QuickAssist
dev.qat.%parent:

vmstat between file copy on a ZFS encrypted root
Code:
# vmstat -i | grep qat
irq86: qat0:b1    1238330    363
irq87: qat0:b2    35              0

# cp /test2 /test4

# vmstat -i | grep qat
irq86: qat0:b1    1371801    400
irq87: qat0:b2    35              0
 
RAW
Transfer rates:
outside: 102400 kbytes in 0.528817 sec = 193640 kbytes/sec
middle: 102400 kbytes in 0.236516 sec = 432952 kbytes/sec
inside: 102400 kbytes in 0.234807 sec = 436103 kbytes/sec

GELI with AESNI only loaded
Transfer rates:
outside: 102400 kbytes in 0.388920 sec = 263293 kbytes/sec
middle: 102400 kbytes in 0.372202 sec = 275119 kbytes/sec
inside: 102400 kbytes in 0.355650 sec = 287924 kbytes/sec

GELI QAT AESNI both loaded
Transfer rates:
outside: 102400 kbytes in 0.611140 sec = 167556 kbytes/sec
middle: 102400 kbytes in 0.578597 sec = 176980 kbytes/sec
inside: 102400 kbytes in 0.577639 sec = 177273 kbytes/sec

GELI QAT only loaded
Transfer rates:
outside: 102400 kbytes in 0.592254 sec = 172899 kbytes/sec
middle: 102400 kbytes in 0.570098 sec = 179618 kbytes/sec
inside: 102400 kbytes in 0.568700 sec = 180060 kbytes/sec
 
Took zfs() out of the question as it seemed to be using geli() under the hood, so just focused on geli() performance and found the issue after some extensive testing.

The QAT 8950 card has three device services available:
  1. Symetric encryption (sym)
  2. Asymetric encryption (asym)
  3. Compression (dc)
But the Intel QAT 8955 (dh8950) card I have can only have two engines running at once, so only two of the three services can be used at anyone time; 8955 only has two device engines, but three services are loadable from the firmware.

FreeBSD qat() kernel module defaults to "sym;dc" when one card is installed (as seen in sysctl dev.qat.0.cfg_services); I guess zero is considered both odd and even in this context, and the "dc" engine overrides the position for the "asym" service by default when only using a single card.

With the QAT configured for both sym and asym, I am seeing a 34% read and 30% write performance increase compared to AES-NI with geli for a HMAC/SHA256, AES-XTS-256 workload, which seems to be the cards sweet spot in assisting.

/boot/loader.conf
Code:
#default with only one Intel QAT 8950: dev.qat.0.cfg_services = "sym;dc"
dev.qat.0.cfg_services = "sym;asym"
qat_dh895xcc_fw_load="YES"
qat_load="YES"
cryptodev_load="YES"

geli init -P -a "HMAC/SHA256" -e "AES-XTS" -l 256 -s 4096 -K ~/throwaway.key -B plaintext.eli gpt/ciphertext

Note:
It seems that the exposed qat services can only be set before the qat kernel module loads, so they have to be in /boot/loader.conf and cannot be altered at runtime with sysctl dev.qat.0.cfg_services="sym;asym".

Note 2:
The dh8950cc (8955) card performs best using HMAC/SHA2 and either AES-XTS-256 or AES-CBC-256.

Warning:
It does not assist at all with Camellia-CBC-128/256 compared to AES-NI alone, and even introduces a write performance reduction of between -4% and -11% when geli() is running a HMAC/SHA1 with AES-CBC-256, or HMAC/SHA1 with Camellia-CBC-128 workload respectively.
 
Last edited:
Back
Top