QAT engines and cryptodev usage - Underutilised card

I have a compatible qat() card that has 12 engines / instances. This is shown in the running profile, and the IRQs that are enumerated when using this card.

In FreeBSD only 2 of the Intel QAT engines are used and are locked to one of the three device acceleration profiles (sym, asym, dc), but the remaining 10 engines remain idle and are never used; never reported as having an active profile loaded or IRQs any raised for those 10 engines. This to me seems like the card is being under used somewhat.

I want to know why FreeBSD is locked to only use two (2) engines / instances, whereas the hardware I have can support up to 12 instances/engines?

E.G. only 2/12 engines are being used, whereas the remaining 10/12 sit idle, and no interrupts occur from the 10/12 engines.

Can anyone let me know if this is a limitation within the Intel QAT driver or is a limitation in the way FreeBSD implement the QAT driver into the cryptodev() framework please?

James.

(Crossposted on Intel's QAT forum).
 
Can anyone let me know if this is a limitation within the Intel QAT driver or is a limitation in the way FreeBSD implement the QAT driver into the cryptodev() framework please?
You should ask on the mailing lists, there are very few developers on this board. Not sure which one would be best though, there doesn't seem to be a list specifically for crypto. Maybe start with freebsd-drivers@ (because of the qat(4) driver), or just freebsd-current@ (as all development starts there).
 
Before anybody dives into anything:
  • Which QAT card do you have specifically?
  • What does sysctl dev.qat look like?
  • How are you monitoring the card? Are you deducing this simply by looking at IRQs?
  • How are you exciting/using the card?
 
  1. Intel QAT card: dh8950cc (8955)

  2. Yes, sysctl dev.qat:
    Code:
    dev.qat.0.cnv_error:
    +-----------------------------------------------------------------+
    |             CNV Error Freq Statistics for Qat Device            |
    +-----------------------------------------------------------------+
    |[AE  0]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  1]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  2]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  3]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  4]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  5]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  6]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  7]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  8]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE  9]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE 10]: TotalErrors:     0 : LastError: No Error       [    0]  |
    |[AE 11]: TotalErrors:     0 : LastError: No Error       [    0]  |
    
    dev.qat.0.fw_counters:
    +------------------------------------------------+
    | FW Statistics for Qat Device                                     |
    +------------------------------------------------+
    AE 11
    Firmware Responses:7761
    Firmware Requests:7761
    AE 10
    Firmware Responses:5190
    Firmware Requests:5190
    AE  9
    Firmware Responses:8744
    Firmware Requests:8744
    AE  8
    Firmware Responses:7760
    Firmware Requests:7760
    AE  7
    Firmware Responses:5191
    Firmware Requests:5191
    AE  6
    Firmware Responses:8743
    Firmware Requests:8743
    AE  5
    Firmware Responses:7759
    Firmware Requests:7759
    AE  4
    Firmware Responses:5194
    Firmware Requests:5194
    AE  3
    Firmware Responses:8743
    Firmware Requests:8743
    AE  2
    Firmware Responses:7759
    Firmware Requests:7759
    AE  1
    Firmware Responses:5191
    Firmware Requests:5191
    AE  0
    Firmware Responses:8741
    Firmware Requests:8741
    
    dev.qat.0.mmp_version: 0.0.1
    dev.qat.0.hw_version: 0
    dev.qat.0.fw_version: 4.18.0
    dev.qat.0.heartbeat: 1
    dev.qat.0.heartbeat_failed: 0
    dev.qat.0.heartbeat_sent: 1
    dev.qat.0.dev_cfg: [GENERAL]
    ServicesEnabled = cy
    AutoResetOnError = 0
    DcIntermediateBufferSizeInKB = 64
    statsDc = 1
    statsDh = 1
    statsDrbg = 1
    statsDsa = 1
    statsEcc = 1
    statsGeneral = 1
    statsKeyGen = 1
    statsLn = 1
    statsPrime = 1
    statsRsa = 1
    statsSym = 1
    FirstUserBundle = 0
    Device_Max_Banks = 32
    Device_Capabilities_Mask = 0x788f
    Device_PkgId = 0
    Device_NodeId = 0
    Device_Max_Rings_Per_Bank = 16
    HW_RevId = 0
    Firmware_MmpVer = 0.0.1
    Firmware_UofVer = 4.18.0
    Device_DcExtendedFeatures = 0x105
    [KERNEL_QAT]
    NumberCyInstances = 8
    NumberDcInstances = 0
    Cy0CoreAffinity = 0
    Cy0IsPolled = 0
    Cy0Name = Cy0
    Cy1CoreAffinity = 1
    Cy1IsPolled = 0
    Cy1Name = Cy1
    Cy2CoreAffinity = 2
    Cy2IsPolled = 1
    Cy2Name = Cy2
    Cy3CoreAffinity = 3
    Cy3IsPolled = 1
    Cy3Name = Cy3
    Cy4CoreAffinity = 0
    Cy4IsPolled = 1
    Cy4Name = Cy4
    Cy5CoreAffinity = 1
    Cy5IsPolled = 1
    Cy5Name = Cy5
    Cy6CoreAffinity = 2
    Cy6IsPolled = 1
    Cy6Name = Cy6
    Cy7CoreAffinity = 3
    Cy7IsPolled = 1
    Cy7Name = Cy7
    Cy0BankNumber = 0
    Cy0RingAsymTx = 0
    Cy0RingSymTx = 2
    Cy0RingAsymRx = 8
    Cy0RingSymRx = 10
    Cy0NumConcurrentAsymRequests = 64
    Cy0NumConcurrentSymRequests = 512
    Cy1BankNumber = 1
    Cy1RingAsymTx = 0
    Cy1RingSymTx = 2
    Cy1RingAsymRx = 8
    Cy1RingSymRx = 10
    Cy1NumConcurrentAsymRequests = 64
    Cy1NumConcurrentSymRequests = 512
    Cy2BankNumber = 2
    Cy2RingAsymTx = 0
    Cy2RingSymTx = 2
    Cy2RingAsymRx = 8
    Cy2RingSymRx = 10
    Cy2NumConcurrentAsymRequests = 64
    Cy2NumConcurrentSymRequests = 512
    Cy3BankNumber = 2
    Cy3RingAsymTx = 1
    Cy3RingSymTx = 3
    Cy3RingAsymRx = 9
    Cy3RingSymRx = 11
    Cy3NumConcurrentAsymRequests = 64
    Cy3NumConcurrentSymRequests = 512
    Cy4BankNumber = 2
    Cy4RingAsymTx = 4
    Cy4RingSymTx = 6
    Cy4RingAsymRx = 12
    Cy4RingSymRx = 14
    Cy4NumConcurrentAsymRequests = 64
    Cy4NumConcurrentSymRequests = 512
    Cy5BankNumber = 2
    Cy5RingAsymTx = 5
    Cy5RingSymTx = 7
    Cy5RingAsymRx = 13
    Cy5RingSymRx = 15
    Cy5NumConcurrentAsymRequests = 64
    Cy5NumConcurrentSymRequests = 512
    Cy6BankNumber = 3
    Cy6RingAsymTx = 0
    Cy6RingSymTx = 2
    Cy6RingAsymRx = 8
    Cy6RingSymRx = 10
    Cy6NumConcurrentAsymRequests = 64
    Cy6NumConcurrentSymRequests = 512
    Cy7BankNumber = 3
    Cy7RingAsymTx = 1
    Cy7RingSymTx = 3
    Cy7RingAsymRx = 9
    Cy7RingSymRx = 11
    Cy7NumConcurrentAsymRequests = 64
    Cy7NumConcurrentSymRequests = 512
    [Accelerator0]
    Bank0InterruptCoalescingEnabled = 1
    Bank0InterruptCoalescingTimerNs = 10000
    Bank0InterruptCoalescingNumResponses = 0
    Bank0CoreAffinity = 255
    Bank1InterruptCoalescingEnabled = 1
    Bank1InterruptCoalescingTimerNs = 10000
    Bank1InterruptCoalescingNumResponses = 0
    Bank1CoreAffinity = 255
    Bank2InterruptCoalescingEnabled = 1
    Bank2InterruptCoalescingTimerNs = 10000
    Bank2InterruptCoalescingNumResponses = 0
    Bank2CoreAffinity = 255
    Bank3InterruptCoalescingEnabled = 1
    Bank3InterruptCoalescingTimerNs = 10000
    Bank3InterruptCoalescingNumResponses = 0
    Bank3CoreAffinity = 255
    Bank4InterruptCoalescingEnabled = 1
    Bank4InterruptCoalescingTimerNs = 10000
    Bank4InterruptCoalescingNumResponses = 0
    Bank4CoreAffinity = 255
    Bank5InterruptCoalescingEnabled = 1
    Bank5InterruptCoalescingTimerNs = 10000
    Bank5InterruptCoalescingNumResponses = 0
    Bank5CoreAffinity = 255
    Bank6InterruptCoalescingEnabled = 1
    Bank6InterruptCoalescingTimerNs = 10000
    Bank6InterruptCoalescingNumResponses = 0
    Bank6CoreAffinity = 255
    Bank7InterruptCoalescingEnabled = 1
    Bank7InterruptCoalescingTimerNs = 10000
    Bank7InterruptCoalescingNumResponses = 0
    Bank7CoreAffinity = 255
    Bank8InterruptCoalescingEnabled = 1
    Bank8InterruptCoalescingTimerNs = 10000
    Bank8InterruptCoalescingNumResponses = 0
    Bank8CoreAffinity = 255
    Bank9InterruptCoalescingEnabled = 1
    Bank9InterruptCoalescingTimerNs = 10000
    Bank9InterruptCoalescingNumResponses = 0
    Bank9CoreAffinity = 255
    Bank10InterruptCoalescingEnabled = 1
    Bank10InterruptCoalescingTimerNs = 10000
    Bank10InterruptCoalescingNumResponses = 0
    Bank10CoreAffinity = 255
    Bank11InterruptCoalescingEnabled = 1
    Bank11InterruptCoalescingTimerNs = 10000
    Bank11InterruptCoalescingNumResponses = 0
    Bank11CoreAffinity = 255
    Bank12InterruptCoalescingEnabled = 1
    Bank12InterruptCoalescingTimerNs = 10000
    Bank12InterruptCoalescingNumResponses = 0
    Bank12CoreAffinity = 255
    Bank13InterruptCoalescingEnabled = 1
    Bank13InterruptCoalescingTimerNs = 10000
    Bank13InterruptCoalescingNumResponses = 0
    Bank13CoreAffinity = 255
    Bank14InterruptCoalescingEnabled = 1
    Bank14InterruptCoalescingTimerNs = 10000
    Bank14InterruptCoalescingNumResponses = 0
    Bank14CoreAffinity = 255
    Bank15InterruptCoalescingEnabled = 1
    Bank15InterruptCoalescingTimerNs = 10000
    Bank15InterruptCoalescingNumResponses = 0
    Bank15CoreAffinity = 255
    Bank16InterruptCoalescingEnabled = 1
    Bank16InterruptCoalescingTimerNs = 10000
    Bank16InterruptCoalescingNumResponses = 0
    Bank16CoreAffinity = 255
    Bank17InterruptCoalescingEnabled = 1
    Bank17InterruptCoalescingTimerNs = 10000
    Bank17InterruptCoalescingNumResponses = 0
    Bank17CoreAffinity = 255
    Bank18InterruptCoalescingEnabled = 1
    Bank18InterruptCoalescingTimerNs = 10000
    Bank18InterruptCoalescingNumResponses = 0
    Bank18CoreAffinity = 255
    Bank19InterruptCoalescingEnabled = 1
    Bank19InterruptCoalescingTimerNs = 10000
    Bank19InterruptCoalescingNumResponses = 0
    Bank19CoreAffinity = 255
    Bank20InterruptCoalescingEnabled = 1
    Bank20InterruptCoalescingTimerNs = 10000
    Bank20InterruptCoalescingNumResponses = 0
    Bank20CoreAffinity = 255
    Bank21InterruptCoalescingEnabled = 1
    Bank21InterruptCoalescingTimerNs = 10000
    Bank21InterruptCoalescingNumResponses = 0
    Bank21CoreAffinity = 255
    Bank22InterruptCoalescingEnabled = 1
    Bank22InterruptCoalescingTimerNs = 10000
    Bank22InterruptCoalescingNumResponses = 0
    Bank22CoreAffinity = 255
    Bank23InterruptCoalescingEnabled = 1
    Bank23InterruptCoalescingTimerNs = 10000
    Bank23InterruptCoalescingNumResponses = 0
    Bank23CoreAffinity = 255
    Bank24InterruptCoalescingEnabled = 1
    Bank24InterruptCoalescingTimerNs = 10000
    Bank24InterruptCoalescingNumResponses = 0
    Bank24CoreAffinity = 255
    Bank25InterruptCoalescingEnabled = 1
    Bank25InterruptCoalescingTimerNs = 10000
    Bank25InterruptCoalescingNumResponses = 0
    Bank25CoreAffinity = 255
    Bank26InterruptCoalescingEnabled = 1
    Bank26InterruptCoalescingTimerNs = 10000
    Bank26InterruptCoalescingNumResponses = 0
    Bank26CoreAffinity = 255
    Bank27InterruptCoalescingEnabled = 1
    Bank27InterruptCoalescingTimerNs = 10000
    Bank27InterruptCoalescingNumResponses = 0
    Bank27CoreAffinity = 255
    Bank28InterruptCoalescingEnabled = 1
    Bank28InterruptCoalescingTimerNs = 10000
    Bank28InterruptCoalescingNumResponses = 0
    Bank28CoreAffinity = 255
    Bank29InterruptCoalescingEnabled = 1
    Bank29InterruptCoalescingTimerNs = 10000
    Bank29InterruptCoalescingNumResponses = 0
    Bank29CoreAffinity = 255
    Bank30InterruptCoalescingEnabled = 1
    Bank30InterruptCoalescingTimerNs = 10000
    Bank30InterruptCoalescingNumResponses = 0
    Bank30CoreAffinity = 255
    Bank31InterruptCoalescingEnabled = 1
    Bank31InterruptCoalescingTimerNs = 10000
    Bank31InterruptCoalescingNumResponses = 0
    Bank31CoreAffinity = 255
    
    dev.qat.0.num_user_processes: 0
    dev.qat.0.cfg_mode: ks
    dev.qat.0.cfg_services: sym;asym
    dev.qat.0.state: up
    dev.qat.0.%domain: 0
    dev.qat.0.%parent: pci8
    dev.qat.0.%pnpinfo: vendor=0x8086 device=0x0435 subvendor=0x8086 subdevice=0x0000 class=0x0b4000
    dev.qat.0.%location: slot=0 function=0 dbsf=pci0:15:0:0
    dev.qat.0.%driver: qat
    dev.qat.0.%desc: Intel dh895xcc QuickAssist
    dev.qat.%parent:



  3. Looking at QAT device counters found at sysctl dev.qat.0.fw_counters and also the vmstat -i | grep qat then contacting Intel contact in the QAT driver header (KrzysztofX Zdziarski) and it was confirmed that only two engines are ever used on FreeBSD.

    For example four (4) simultaneous dd running at the same time from different users on a ZFS or UFS file system which have disk levelgeli() with sysctl dev.qat.0.fw_counters before and after:

    Before:
    Code:
    dev.qat.0.fw_counters:
    +------------------------------------------------+
    | FW Statistics for Qat Device                                     |
    +------------------------------------------------+
    AE 11
    Firmware Responses:7761
    Firmware Requests:7761
    AE 10
    Firmware Responses:5190
    Firmware Requests:5190
    AE  9
    Firmware Responses:60399
    Firmware Requests:60399
    AE  8
    Firmware Responses:7760
    Firmware Requests:7760
    AE  7
    Firmware Responses:5191
    Firmware Requests:5191
    AE  6
    Firmware Responses:60398
    Firmware Requests:60398
    AE  5
    Firmware Responses:7759
    Firmware Requests:7759
    AE  4
    Firmware Responses:5194
    Firmware Requests:5194
    AE  3
    Firmware Responses:60398
    Firmware Requests:60398
    AE  2
    Firmware Responses:7759
    Firmware Requests:7759
    AE  1
    Firmware Responses:5191
    Firmware Requests:5191
    AE  0
    Firmware Responses:60397
    Firmware Requests:60397

    After:
    Code:
    dev.qat.0.fw_counters:
    +------------------------------------------------+
    | FW Statistics for Qat Device                                     |
    +------------------------------------------------+
    AE 11
    Firmware Responses:7761
    Firmware Requests:7761
    AE 10
    Firmware Responses:5190
    Firmware Requests:5190
    AE  9
    Firmware Responses:250566
    Firmware Requests:250567
    AE  8
    Firmware Responses:7760
    Firmware Requests:7760
    AE  7
    Firmware Responses:5191
    Firmware Requests:5191
    AE  6
    Firmware Responses:250466
    Firmware Requests:250466
    AE  5
    Firmware Responses:7759
    Firmware Requests:7759
    AE  4
    Firmware Responses:5194
    Firmware Requests:5194
    AE  3
    Firmware Responses:250361
    Firmware Requests:250361
    AE  2
    Firmware Responses:7759
    Firmware Requests:7759
    AE  1
    Firmware Responses:5191
    Firmware Requests:5191
    AE  0
    Firmware Responses:250258
    Firmware Requests:250259




  4. four (4), eight (8) and also (16) simultaneous writes to a file on the filesystem (not raw device) enumerated via multiple concurrent ssh connections (scripted) each using dding to a file on either a 4 disk RAID-10 ZFS setup or a geom() gstripe() pair of geom gmirror() using UFS.

    Both configurations used four PCI based nda()/nvme() devices both with underlying geli() encrypted partitions (HMAC/SHA2-AES-XTS-256, which uses the asym engine) and also tested using HMAC/SHA2-AES-CBC-256 (which uses the sym engine), with a GPT partitioning scheme and 4k alignment, with a 1M partition offset.

    Finally the if dev.qat.0.cfg_services="sym;asym;sym;asym;sym;asym;sym;asym;" is set in /boot/loader.conf, it reverts to dev.qat.0.cfg_services="sym;asym" on load, as per qat().
So I am reasonably confident that only two engines are used at once...
 
Before anybody dives into anything:
  • Which QAT card do you have specifically?
  • What does sysctl dev.qat look like?
  • How are you monitoring the card? Are you deducing this simply by looking at IRQs?
  • How are you exciting/using the card?

What would you recommend diving into first? :)
 
Back
Top