ZFS Why use Geli underneath ZFS these days?

It's full disk encryption, the 'downside' of ZFS's own encryption is that you can still see the existence of the dataset when it's locked. With a GELI encrypted partition you're not going to be able to infer what might be on it. That may be a deciding factor for some people.
 
Be aware of possible corruption bugs when using ZFS native encryption in combination with zfs-send(8)/zfs-recv(8), see [1]. It's suggested to consider ZFS native encryption as experimental and not ready for production. Of course there are many many people all around the world who have already been using ZFS native encryption in production for many many years without problems, so this whole issue seems unclear to me. I felt uneasy using ZFS native encryption since I knew about the issue and rewrote my backup disks to using GELI + ZFS.

It seems there's some progress on identifying and fixing the issue, at least partially, see recent comments in [2].

[1] https://github.com/openzfs/openzfs-docs/issues/494
[2] https://github.com/openzfs/zfs/issues/12014
 
Last I heard ZFS encryption isn't very highly developed wrt key handling. Also note this from the zfs-set-key(8) manpage:

" If the user's key is compromised, zfs change-key does not necessarily
protect existing or newly-written data from attack. Newly-written data
will continue to be encrypted with the same master key as the existing
data. The master key is compromised if an attacker obtains a user key
and the corresponding wrapped master key. Currently, zfs change-key
does not overwrite the previous wrapped master key on disk, so it is
accessible via forensic analysis for an indeterminate length of time.
"
 
Assuming people are interested in encryption: Why would one use Geli with ZFS on top, given that ZFS encryption is available?
last time I did a benchmark, using using geli was around 3x faster than zfs encryption with heavy random read-write tests, however, I can't remember when that was. Furthermore, at some point I injected random bits to the disk and the filenames with errors were listed when printing the status although the keys were not loaded (when I used zfs encryption)
 
Just a note that the corruption bug affected systems with encrypted datasets after unencrypted (that is, not raw) zfs send of one of such datasets.
The root cause has been recently established and the problem has been fixed.
 
Last edited:
Last I heard ZFS encryption isn't very highly developed wrt key handling. Also note this from the zfs-set-key(8) manpage:

" If the user's key is compromised, zfs change-key does not necessarily
protect existing or newly-written data from attack. Newly-written data
will continue to be encrypted with the same master key as the existing
data. The master key is compromised if an attacker obtains a user key
and the corresponding wrapped master key. Currently, zfs change-key
does not overwrite the previous wrapped master key on disk, so it is
accessible via forensic analysis for an indeterminate length of time.
"

That's the same as geli.
 
Just a note that the corruption bug affected systems with encrypted datasets after unencrypted (that is, not raw) zfs send of one of such datasets.
The root cause has been recently established and the problem has been fixed.
do you have a link to a commit or an issue?
 
Assuming people are interested in encryption: Why would one use Geli with ZFS on top, given that ZFS encryption is available?
There are two key use cases that ZFS encryption enables: sending an encrypted dataset for backup to an untrusted (where the key never exists remotely) third party — including the ability to later send incremental updates, and using the same encrypted dataset (not concurrently) on different OSes (Linux/FreeBSD; possibly Mac and Windows, but I haven’t checked the status of those ZFS ports lately.).

If you don’t have those use cases, I’d suggest GELI for overall easier management. It’s conceptually simpler (the whole block device or partition is encrypted).

ZFS encryption also allows different per-dataset encryption, which could allow things like per-user encryption of home/foo filesystems of devices at rest, for example, but last I checked the interaction with various logins (local graphical, SSH) was inelegant and best and broken at worst.
 
 
Well that discussion is a little unsettling. It looks like everyone agrees it's a good fix, but there's some uncertainty on how it addresses the problem.
That's not the impression I got.
The bug was pretty clear and obvious (once spotted).
 
That's not the impression I got.
The bug was pretty clear and obvious (once spotted).
Agreed that it's a clear and obvious fix, but there's some uncertainty on how it causes the observed symptoms:

Has anyone done a full analysis to understand how this bug manifested as the myriad encryption bugs, panics and other quirks we've seen over the years?

Don't get me wrong; this PR is obviously right for what it is, and I won't be surprised if it closes a locking gap somewhere that could be hit if you're unlucky, but I'd like those dots connected so I can feel comfortable saying yes, this definitely sorts it out. Especially since the forums are starting to pick up on it and the high-fives and cheers are making me a bit uncomfortable.

(and yes, I am a blast at parties 🎉 😬)

If not, I've got some spare time this afternoon; I'll try to do the analysis myself. (Ed: No analysis follows.)
@robn I don't think there is still a proper understanding of what's going on. At least I haven't seen and don't have one.

Those are the last two comments on the PR. There are more upthread, but I've been tedious enough already.
 
Jose what's clear is that "garbage in, garbage out".
Some keys that should have been unloaded remained loaded and could get used in some operations.
Yeah, understanding all the details of garbage transformation may be quite hard, but usually it's not required.
 
I really like GELI, but I am looking into a solution to automate the entry of the passphrase component, this way servers can boot and serve resources without having to enter the passphrase manually. Of course, the purpose of GELI is to hide cold storage data, so the key would have to be in a secure location inside the network. In case the disks are booted somewhere outside the network, they are not decrypted.

Anyone know some pointers to help me look into this?
 
There are two key use cases that ZFS encryption enables
During a detailed discussion in ZFS Ask Me Anything - 12 Dec 2024 with Allan Jude, Mark Maybee and Jim Salter:
Generally: use ZFS encryption when you need more grained control, otherwise when only in need of a broader encryption mechanism use whole disk encryption.

The first question that came up:
Do you recommend ZFS built-in encryption or running ZFS on top of another encryption mechanism?
I heard recently there's been some bugs around sending encrypted snapshots between ZFS pools.
A variation on the quoted use cases for ZFS encryption came up in the segment Should you use ZFS Built-in Encryption and the immediate following segment: Best practices for ZFS Encryption, from ca. 4:30-15:19 min. Use case mentioned there is the provable and reliable destruction of data limited to an encrypted dataset. Apart from this law firm based example, also ZFS' encryption in the context of a medical setting was discussed. Followed by discussion on ZFS 'encryption bugs'; I think that the discussion there probably pre-dates the issue discussed in this thread.
 
Back
Top