ZFS Can the sanity of ZFS snapshots be vetted before they're permanently saved?

robroy · Mar 27, 2016

FreeBSD Friends,

In ZFS in the trenches, Josh Paetzel's interviewed about his ZFS experiences. It's a fascinating interview.

He describes an event where ACLs stored by Samba triggered a ZFS bug, which caused a pool to become corrupt. What gave me pause, was hearing that this corruption traveled over a replication stream to a remote pool, and corrupted the remote pool also.

I'm guessing that the bug(s) related to this event have long since been fixed.

I don't mean to promote FUD, yet hearing about this caused all kinds of warning lights to flash in my brain. Though, I suppose this kind of problem isn't really ZFS-specific; I guess any automatic backup system has the ability to automatically destroy backups, if the source it's reading from's pathological enough to overwhelm whatever safeties it has.

Yet the scenario Josh Paetzel describes is eerily similar to a multi-site system I'm configuring now. Snapshots from a remote system (which is written to by a Mac, through Samba), will be replicated to my local pool. And my local pool's being used for more than just that, so if it goes down, it'll cause more trouble than just a backup system failure.

It recently crossed my mind that it might be possible to include a vetting process, on a separate local pool--a "quarantine" pool--before committing the snapshot to my main local pool. If this were possible, if a pathological snapshot came from the remote pool, it'd corrupt the "quarantine" pool (dedicated to this purpose), instead of my main local pool (which has other purposes also). And vetted snapshots could be replicated locally from this "quarantine" pool to my main pool safely.

Yet I'm not sure whether a ZFS scrub would sufficiently vet a snapshot.

Maybe I'm just paranoid and am over-thinking this.

If anybody feels like commenting, I'd be curious to hear your thoughts. Thank you!

Crest · Mar 31, 2016

Yes you can use a pool backed by ZFS volumes (or even just files) in your normal backup pool to spool your replication streams, but it won't offer perfect protection from corrupted streams. ZFS receive already validates checksums and if you pipe your replication streams through SSH they are protected by its MAC in transit so there is no reason to worry about bitflips etc. It won't protect you from malicious streams with valid ZFS checksums generated by the sender. ZFS doesn't protect itself from malicious streams and not all forms of pool corruption are immediately obvious. In theory you could spread subtle corruptions through multiple hops.

ZFS Can the sanity of ZFS snapshots be vetted before they're permanently saved?

robroy

Crest