FreeBSD Friends,
In ZFS in the trenches, Josh Paetzel's interviewed about his ZFS experiences. It's a fascinating interview.
He describes an event where ACLs stored by Samba triggered a ZFS bug, which caused a pool to become corrupt. What gave me pause, was hearing that this corruption traveled over a replication stream to a remote pool, and corrupted the remote pool also.
I'm guessing that the bug(s) related to this event have long since been fixed.
I don't mean to promote FUD, yet hearing about this caused all kinds of warning lights to flash in my brain. Though, I suppose this kind of problem isn't really ZFS-specific; I guess any automatic backup system has the ability to automatically destroy backups, if the source it's reading from's pathological enough to overwhelm whatever safeties it has.
Yet the scenario Josh Paetzel describes is eerily similar to a multi-site system I'm configuring now. Snapshots from a remote system (which is written to by a Mac, through Samba), will be replicated to my local pool. And my local pool's being used for more than just that, so if it goes down, it'll cause more trouble than just a backup system failure.
It recently crossed my mind that it might be possible to include a vetting process, on a separate local pool--a "quarantine" pool--before committing the snapshot to my main local pool. If this were possible, if a pathological snapshot came from the remote pool, it'd corrupt the "quarantine" pool (dedicated to this purpose), instead of my main local pool (which has other purposes also). And vetted snapshots could be replicated locally from this "quarantine" pool to my main pool safely.
Yet I'm not sure whether a ZFS scrub would sufficiently vet a snapshot.
Maybe I'm just paranoid and am over-thinking this.
If anybody feels like commenting, I'd be curious to hear your thoughts. Thank you!
In ZFS in the trenches, Josh Paetzel's interviewed about his ZFS experiences. It's a fascinating interview.
He describes an event where ACLs stored by Samba triggered a ZFS bug, which caused a pool to become corrupt. What gave me pause, was hearing that this corruption traveled over a replication stream to a remote pool, and corrupted the remote pool also.
I'm guessing that the bug(s) related to this event have long since been fixed.
I don't mean to promote FUD, yet hearing about this caused all kinds of warning lights to flash in my brain. Though, I suppose this kind of problem isn't really ZFS-specific; I guess any automatic backup system has the ability to automatically destroy backups, if the source it's reading from's pathological enough to overwhelm whatever safeties it has.
Yet the scenario Josh Paetzel describes is eerily similar to a multi-site system I'm configuring now. Snapshots from a remote system (which is written to by a Mac, through Samba), will be replicated to my local pool. And my local pool's being used for more than just that, so if it goes down, it'll cause more trouble than just a backup system failure.
It recently crossed my mind that it might be possible to include a vetting process, on a separate local pool--a "quarantine" pool--before committing the snapshot to my main local pool. If this were possible, if a pathological snapshot came from the remote pool, it'd corrupt the "quarantine" pool (dedicated to this purpose), instead of my main local pool (which has other purposes also). And vetted snapshots could be replicated locally from this "quarantine" pool to my main pool safely.
Yet I'm not sure whether a ZFS scrub would sufficiently vet a snapshot.
Maybe I'm just paranoid and am over-thinking this.
If anybody feels like commenting, I'd be curious to hear your thoughts. Thank you!