ZFS geli encrypted nested zpool

dch

Developer
hey,

I set this up last week it seems to work but whether it's actually a good idea or not ... in particular I wonder if I should have disabled the cache on the nested zpool rather than the parent zvol, and whether I actually need to unmount the nested zpool to snapshot and sync. It would be very nice if I didn't need to do this. What I'm really waiting for is native zfs dataset encryption, and then this obscene hackery could die in the dumpster fire it deserves...

performance is ok, on a box with zpool mirrored 3TB enterprise sata disks, I get ~ 40MiB/s as a zfs send target, from a dataset on the same system. The same zfs send command sent to /dev/null gives ~160MiB/s.

comments welcomed.

objective
  • create a zfs sendable encrypted dataset
  • recover the remote dataset back, decrypt and remount it successfully
create an unencrypted zvol and fill it with random rubbish

Code:
# zfs create -o canmount=off zroot/vols
# zfs create -o volmode=geom \
   -o primarycache=none -o secondarycache=none \
    -o volblocksize=4k \
    -o compression=off \
    -V 500G zroot/vols/secure
# ls -AFGhl /dev/zvol/zroot/vols/
total 0
crw-r-----  1 root  operator   0xa1 Feb 18 21:00 secure
# dd if=/dev/random of=/dev/zvol/zroot/vols/secure bs=1m


partition and label the zvol

Code:
# gpart create -s gpt /dev/zvol/zroot/vols/secure
zvol/zroot/vols/secure created
# gpart add -t freebsd-zfs -l secure /dev/zvol/zroot/vols/secure
zvol/zroot/vols/securep1 added
# ls -AFGhl /dev/gpt/secure
crw-r-----  1 root  operator   0xa3 Feb 18 21:18 /dev/gpt/secure
# zfs snapshot zroot/vols/secure@blank:unencrypted

geli encrypt it

Here we write and then read the geli_secret out of vault, all done as a normal user:

Code:
$ vault write -address=https://vault:8200 secret/backup geli_secret='wouldnt you like to know'
$ vault read -address=https://vault:8200 -field=geli_secret secret/backup \
   | sudo geli init -s 4096 -J - -l 256 /dev/gpt/secure

Metadata backup can be found in /var/backups/gpt_secure.eli and
can be restored with the following command:

    # geli restore /var/backups/gpt_secure.eli /dev/gpt/secure

$ vault read -address=https://vault:8200 -field=geli_secret \
    secret/backup | sudo geli attach -j - /dev/gpt/secure

create the new overlay zpool

Code:
# zpool create -o failmode=continue -O compression=lz4 -O mountpoint=none secure /dev/gpt/secure.eli
# zfs snapshot secure@empty
# zpool export secure
# zfs snapshot zroot/vols/secure@blank:encrypted

send all the things

Code:
# zpool import -N secure
# zfs send -Lev zroot/var/db/precious@20170220-1623 \
  | zfs recv -Fuv secure/precious
# zpool export secure

replicate the encrypted zpool

Code:
# zfs snapshot zroot/vols/secure@`date -u +%Y%m%d-%M%H`
# zfs send ....
# zpool import -N secure

destroy everything

Code:
zpool export secure
geli detach gpt/secure.eli
zfs destroy zroot/vols/secure
 
An interesting solution and certainly ZFS volumes offer a lot of flexibility. Like you I would be concerned about the integrity of your nested ZFS pool secure if you did not export it before taking the snapshot, since the pool hosting the volume doesn't know anything about the volume itself. I would also be concerned about low memory conditions where the nested pool might cause issues -- though you could test for that. A drawback I see is that the data on the volume (that is, the contents of the GELI container) will not be compressible, which will impact the amount of data you need to send remotely. For a full ZFS snapshot it will always be the total size of the volume. Have you considered other options? For example:
  • Sending a ZFS dataset using ssh(1) so it is encrypted in transit, then receiving it to a ZFS pool that is stored in encrypted form (in a GELI container or something else)
  • Sending a ZFS dataset to a flat file, piping through compression and encryption tools. You could subsequently send, compress and encrypt incremental updates in the same way. The transport method and remote storage need not themselves be encrypted in this method.
  • Using the same approach you suggest but with a UFS file system on your ZFS volume instead of a nested pool. This still has the issue that the data on the volume will not be compressible.
 
Thanks Ross good points.

the 2 key requirements were: zero trust of the receiving end wrt reading the encrypted data, and I particularly want zfs datasets/snapshots of the usable filesystem (i.e. the nested one). I'm still undecided what I will do long term. When dumping to flat files: `zfs send ... | xz | hpenc > /some/zfs/dataset` and then zfs sending that, is operationally simpler to understand & manage, but the big tradeoff is around recovery time, however. The encrypted zvol approach has the big advantage that as soon as you're recovered the the zpool from the backup environment to the local one, then you are ready to zpool import & go, whereas with the other ones you need to decrypt, decompress, zfs recv. In the scheme of things, this should be a temporary situation until native encrypted datasets lands in zfs.
 
Digging a bit further I found this sysctl vfs.zfs.vol.recursive and finally a reference to a commit. If I experience any deadlocks, I'll report back.
 
https://reviews.freebsd.org/D4998 said:
Building ZFS pools on top of zvols is prohibited by default. That feature has never worked safely; it's always been prone to deadlocks.
Ouch. I think that would put me off trying to nest ZFS pools on a production system.
 
Well it's been running perfectly for a month so far. The IO load is erratic but I've had no issues doing db replication into the nested zvol, while running poudriere and zfs send of the vzol offsite. Evidently YMMV!
 
So I ended up deleting this setup today as it seems to introduce a lot more IO load than desirable. This box has other things to do, and the clear note about unsupported config in the above commit is a real turn-off.
 
Yeah, encryption is a process which by its very nature would definitely generate overhead, and I can easily see this happening for ZFS like this. Now, I can't make any solid estimates on this, but I can't help wonder what might happen if you'd simply create an UFS (virtual) filesystem to house whatever it is you need and use that to send/restore across.

So basically a regular file which you set up with a encrypted filesystem (UFS + GELI) and then mount using a loop device. It would still generate some overhead I'd wager, but most likely less than creating a "hack" for ZFS (or so I assume).

Maybe food for thought?
 
Back
Top