Other GELI performance vs usable data

albertobsd · Sep 14, 2017

Well I don't know if this is the correct subforum for this topic.

Reading about FDE with Data Autentication in GELI I read that some space is designed to store the Hmac of the stored encrypted data.

In /usr/src/sys/geom/eli/g_eli_integrity.c says that:

Code:

One of the most important assumption here is that authenticated data and its HMAC has to be stored in the same place (namely in the same sector) to make it work reliable.

With 4096 bytes sector we can use 89% of size of the original provider. I find it as an acceptable cost.

I ask:

Why the authenticated data need to be stored with its HMAC in the same place?

With the actual configuration in a perfect 500 GB Disk we only have usable:

477218586624 bytes
466033776 KiB
455111 MiB
444.44 GiB Usable

We need 55 GiB of the Original 500 GiB (11%) to authenticate data.

I Propose:

Code:

da0:
     +---------+ +---------+ +---------+ +---------+ +---------+ +---------+ 
     |    0    | |    1    | |    2    | |    3    | |   ...   | |    16   |
     +----+----+ +----+----+ +----+----+ +----+----+ +----+----+ +----+----+
     |16 Times | | F U L L | | F U L L | | F U L L | | F U L L | | F U L L |
     |32b HMAC | | D A T A | | D A T A | | D A T A | | D A T A | | D A T A |
     +----+----+ +----+----+ +----+----+ +----+----+ +----+----+ +----+----+
     |512 bytes| |512 bytes| |512 bytes| |512 bytes| |512 bytes| |512 bytes|
     +---------+ +---------+ +---------+ +---------+ +---------+ +---------+

With this configuration we have usable

505290269696 bytes
493447529 KiB
481882 MiB
470.58 GiB Usable

We only need 30 GiB of the Original 500 GiB (6%) to authenticate data.

leebrown66 · Sep 14, 2017

With the scheme described, imagine if I wanted to read sector 3 in your diagram. We need to read in a full 16 sectors, run the authentication over that dataset to ensure it's integrity, copy out the 512 bytes we wanted and either try to cache the remaining 15 sectors that nobody wanted, or throw them away.

Performance is going to be terrible.

If sector 2 becomes corrupt, all you will know is that the entire set is damaged. Data loss becomes more damaging (if that's a thing).

Simplicity. The current scheme reads one block, processes it, passes it along or doesn't.

albertobsd · Sep 14, 2017

Well your point is valid, but:

leebrown66 said:
We need to read in a full 16 sectors

I never say that we need read the full 16 sectors

If we know that our requested data is in the sector 3 we only need read the sector 0 and the sector 3, 0 for the hmac at the offset of 64 bytes of the sector 0 and the sector number 3, run Autentication, decrypt and return it to the User Application...

The process for the actual geli is pretty similar, if we want read a single set of 512 bytes we need read 2 sectors and also we need run the authentication for 2 sectors.

In my propose the sector 0 have the hmac for every next 16 Sectors

Code:

+---------------------------+
|         Sector 0          |
+---------------------------+
|32 Bytes HMAC for Sector  1|
|32 Bytes HMAC for Sector  2|
|32 Bytes HMAC for Sector  3|
|....                       |
|32 Bytes HMAC for Sector 16|
+---------------------------+

leebrown66 · Sep 14, 2017

albertobsd said:
In my propose the sector 0 have the hmac for every next 16 Sectors

Sorry, missed that, for some reason I thought it was a single hash for the whole set of sectors.

Something doesn't add up here. If only 89% of a 4096 byte sector is available, they are using up 450 bytes to store the hash. SHA256 only needs 32 bytes as you specify, it doesn't matter how much data we're sucking up though, the hash is always 32 bytes. Therefore a larger block size is supposed to result in less wastage at the expense of performance.

albertobsd · Sep 14, 2017

Well also the actual scheme use in every 512 byte sector has 32 bytes for hash and only 480 bytes of data they put together 9 sector to get a full 4096 bytes of usable data for the user.

So for every 9 sector of 512 bytes they dont use (224 bytes) those bytes are always with 0 value

You can see it in the coments in /usr/src/sys/geom/eli/g_eli_integrity.c

https://github.com/freebsd/freebsd/blob/master/sys/geom/eli/g_eli_integrity.c

That is why I say that the scheme need a little change to recover all the bytes that are unused.

leebrown66 · Sep 15, 2017

OK, I see how they are doing it now.

Yeah, the only downside with your scheme I see is the possibility of losing 16 sectors if the hash sector gets corrupted or becomes unreadable.

With modern disks doing NCQ I doubt your scheme would have much impact at the disk and the coding is simpler than the existing mechanism, so it may even be less pressure on the CPU.

Caching the hash sectors would probably be mandatory though, imaging reading sector 1 through 16 sequentially having to re-read sector 0 every time. Still I don't think that added complexity detracts.

albertobsd · Sep 15, 2017

Thanks for reading!

leebrown66 said:
Yeah, the only downside with your scheme I see is the possibility of losing 16 sectors if the hash sector gets corrupted or becomes unreadable.

Well, I saw the same possibility, but think again if for some reason some sectors get corrupted or becomes unreadable what is te possibility that only one sector get damage, in some case many sectors get corrected, so the actual scheme and the proposed scheme should be fail.

leebrown66 said:
Caching the hash sectors would probably be mandatory though.

Yes, I think the same!

leebrown66 · Sep 15, 2017

One thing I dislike about geli is the fact you need to wipe all the sectors before use, otherwise you get lots of seemingly corrupted sectors.

With your scheme, you'd could zero every 17th physical sector when you initialize the geli device for the first time. When the first attempt to access from that logical sector is made, examining the hash physical sector reveals it to be zero'd, so we can zero this block of sectors on the fly. I'm sure there's an obvious problem with this concept though...