Poor performance with AESNI GELI

Hello

I've got an E3-1240, 3.4Ghz quad core xeon with the AESNI instructionset.
I'm getting no more than 180mb/sec encryption with GELI, without a filesystem involved.

In this test I enable the geom_zero device, encrypt it, read from it and dump the encrypted data to /dev/null.

Code:
# kldload aesni
# kldload geom_eli
# kldload geom_zero
# geli onetime -s 4096 gzero
# sysctl kern.geom.zero.clear=0

# geli list gzero.eli
Geom name: gzero.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 128
Crypto: hardware
Flags: ONETIME
KeysAllocated: 2
KeysTotal: 268435456
Providers:
1. Name: gzero.eli
   Mediasize: 1152921504606846976 (1.0E)
   Sectorsize: 4096
   Mode: r0w0e0
Consumers:
1. Name: gzero
   Mediasize: 1152921504606846976 (1.0E)
   Sectorsize: 512
   Mode: r1w1e1

# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4096+0 records in
4096+0 records out
4294967296 bytes transferred in 24.186710 secs (177575508 bytes/sec)

I also tried encrypting an SSD with ZFS on and got about the same speed as above.

For comparison sake, using DiskCryptor on the same machine in Windows I get up to 5.3 GIGABYTE / second in benchmarking tests using AES-XTS, and the absolute slowest speed is

I'm obviously missing something here... guessing that DiskCryptor is misrepresenting the encryption speed somewhat, but... looking around on Google I see 2.4gb/sec in TrueCrypt.

So... am I benchmarking the wrong way in FreeBSD? Any ideas on how to speed this up?
170mb/sec isn't going to cut it when split over 20 disks... :(
 
Not sure if it's the issue but I noticed gzero has a sectorsize of 512 while gzero.eli has 4096. What happens if both have the same sectorsize?
 
It makes no difference with 512 byte sector size unfortunately :(

This is a just installed system (first time with FreeBSD ever) and I realized that I didn't have the /dev/crypto device in the kernel (although I guess GELI doesn't use it?).

I recompiled the kernel and ran the following:
Code:
# cd /usr/src/tools/tools/crypto/
# make clean install
# ./cryptotest -a aes256 4096 100000
   1.170 sec,    8192 aes256 crypts,  100000 bytes, 700293043 byte/sec,  5342.8 Mb/sec

That is 5.3gb/sec, almost exactly the performance of DiskCryptor in Windows.
But I'm still getting the same poor performance from GELI.

I noticed that GELI only uses one thread for the encryption per device.
The machine has 4 cores with two paralell pipelines each, for a total of 8.

Assuming that in a real world usage scenario GELI will use all 8 pipelines (will it?) then 175 * 8 = 1400, which is better, but still a far cry from 5.3gb...
 
By default GELI spreads a thread for every available core, for example, I have dual-core box, so by deafult it uses 2 threads:

Code:
g_eli[1] ada0p3
g_eli[0] ada0p3

But You can control that with that OID:
Code:
% sysctl -d kern.geom.eli.threads
kern.geom.eli.threads: Number of threads doing crypto work

Value 0 means use one thread per one core.
 
Isn't it AES-CBC that's supposed to be able to achieve wire speed with AES-NI? There was a commit recently-ish by phk (I think) that had some notes about it. Which I can't find now, of course.
 
Interesting, with aes-cbc and 128 keylength I get ~300mb/sec
4294967296 bytes transferred in 14.069708 secs (305263426 bytes/sec)

Thats double... any other suggestions for making it a bit faster? :D
It is a bit odd, since DiskCryptor on windows was using AES-XTS..
 
Faldaani said:
Interesting, with aes-cbc and 128 keylength I get ~300mb/sec
4294967296 bytes transferred in 14.069708 secs (305263426 bytes/sec)

Thats double... any other suggestions for making it a bit faster? :D
It is a bit odd, since DiskCryptor on windows was using AES-XTS..

No experience with DiskCryptor; could it have been delayed encryption? (Write the raw data, encrypt while the processor is idle.)
 
Those fixes seem interesting... will have to get those... somehow...
Just out of curiosity, which one is "better", AES-CBC or AES-XTS? My googling says XTS should be used for random IO?

I need to pick an algorithm soon.. its a bit of a pain to change it later.
Figure I'll pick the one that is the most likely to have performance improvements later.

I did some additional benchmarks with windows and it turns out that the performance isn't that good when doing actual encryption (compared to its benchmarks), basically equivalent to GELI. Oops.

Just wish I understood why encryption of IO isn't as fast as the benchmarks. Due to a roundtrip through the CPU? But those busses are crazy fast... so I doubt it?
 
Hmm.. I suddenly get 750mb/s with AES-CBS 128 and 550mb/s with AES-CBS 256... for some reason. I haven't changed anything except my network adapter and performed a reboot, which should be totally unrelated.

AES-CBS, 128 key length:
4294967296 bytes transferred in 5.770889 secs (744247107 bytes/sec)

AES-CBS, 256 key length:
4294967296 bytes transferred in 7.311982 secs (587387571 bytes/sec)

AES-XTS, 128 key length:
4294967296 bytes transferred in 24.125382 secs (178026914 bytes/sec)

AES-XTS, 256 key length:
4294967296 bytes transferred in 26.799398 secs (160263574 bytes/sec)
 
In the commit he also says
As a side-note, GELI with AES-NI using AES-CBC can achive native disk speed.

MFC after: 3 days

This seems to match your benchmark.

I wonder why there is such a big difference between XTS and CBC. Anyone know the security difference between both of them?

Btw:
As far as I understood, wiki says that you do need a 512bit key for AES-XTS-256 and a 256bit key for AES-XTS-128.
Can anyone confirm that if I want to encrypt my disks with geli aes-xts-128, that I will do need the specified key length?
 
Yeah, unless you have a really fast RAID array ;)
But I'm more than happy with 700mb/s, just wish I understood why it suddenly improved :D

I'd also be interested to know about the keylength.
Currently I'm generating a 2kb random key file and sending -l 256 as keylen for AES-XTS-256, should be safe, right?
 
Hmm.. it appears that AES-CBC performance is related to the sector size of the GELI device.
So is XTS, but not as much.

AES-CBC, 128 key length, 512b sector
# geli detach /dev/gzero.eli; geli onetime -l 128 -e aes-cbc -s 512 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 18.744036 secs (229137807 bytes/sec)

AES-CBC, 128 key length, 2048b sector
# geli detach /dev/gzero.eli; geli onetime -l 128 -e aes-cbc -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 7.621870 secs (563505711 bytes/sec)

AES-CBC, 128 key length, 4096b sector
# geli detach /dev/gzero.eli; geli onetime -l 128 -e aes-cbc -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 5.792496 secs (741470917 bytes/sec)

AES-XTS, 128 key length, 512b sector
# geli detach /dev/gzero.eli ; geli onetime -l 128 -e aes-xts -s 512 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 32.427407 secs (132448682 bytes/sec)

AES-XTS, 128 key length, 2048b sector
# geli detach /dev/gzero.eli ; geli onetime -l 128 -e aes-xts -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 25.264700 secs (169998746 bytes/sec)

AES-XTS, 128 key length, 4096b sector
# geli detach /dev/gzero.eli ; geli onetime -l 128 -e aes-xts -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 23.995813 secs (178988196 bytes/sec)
 
Sigh, copy paste errors.
The last geli onetime command for xts/cbc is sector size 4096 (not 2048 as it says), I just copied it wrong <.<
 
4096 is the default for geli. Anything smaller shouldn't really be used.
Of course 4096b will be faster than smaller ones. So use 4096b.

Another interesting point is the drive itself. Is yours already a 4k (advanced format drive) disk?
 
Btw here are mine
Code:
# geli onetime -l 128 -e aes-cbc -s 4096 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 6.837752 secs (628125644 bytes/sec)


# geli onetime -l 128 -e aes-xts -s 4096 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 32.004749 secs (134197812 bytes/sec)

It really is much faster. I should have know it before encrypting 8 disks with XTS :-(
 
I've got a mix of 512 and 4k drives. Right now I haven't even tested with the drives (they're in use, going to nuke them later)... just staying with memory based testing for now.

I guess I'll go with AES-CBC 256 then.. unless someone else has a good reason for using XTS over CBC?

I don't really understand the difference.
 
An argument against CBC could be this
Unlike XTS, CBC must read the previos cypher block to encrypt the next, and...
in CBC (with IV's), if you need to change some data on block 1, then, you will
need to recypher subsequent blocks.

I dont know well how it works in the real scenario. But taking this statement,
to be fast, you will need more frequence of IV's by blocks chains, who will
starvate the IV's security (depending on the IV size obviously and entropy)...

Im not here comparing CBC with XTS. XTS will be more fast since you can do
parallel operations. XTS have some strong design on some attacks...
http://seclists.org/basics/2009/May/253
 
Yeah.. not sure I understand its implications.

What is a cypher block in the context of GELI? I'm guessing that each HDD sector has one or more cypher blocks? If that is the case it shouldn't matter since the whole sector is rewritten anyway (when data is written to disk)?

Don't know enough of how it works internally in GELI :(
 
Back
Top