ZFS Questions

Hey guys, thanks for reading this.

Windows (and DOS before) guy here, I know very little about FreeBSD and ZFS apart from what I’ve read, that’s why I’m resorting to this forum. I want to set up a FreeBSD ZFS system (boot drive + another two drives of 2TB each) for backup purposes. I may also use the system lightly (web surfing or email). The data is going be always online (powered 24/7), but only rarely accessed (either directly or over a local network) and also rarely updated. I’m thinking FreeBSD because it’s stable, secure and free (and I’m excited to play with a new OS) and ZFS because of its focus on data integrity.

My questions:
  • I’m not planning for any RAID-like setup. I’d rather manually copy the exact same data to the two drives (so they’re exact replicas of one another) and keep them independent instead of pooling them. This way, if one drive fails I’ll just replace it and copy the data from the remaining drive. Wouldn’t this eliminate any potential issues with the matrix/pool failing to be rebuilt and leading to data loss? Also if I accidentally delete something (or it gets corrupted) I could retrieve it from the other drive regardless of what happened afterwards. Am I missing something here, isn’t keeping the drives independent (and manually mirrored) much safer? Why is everyone recommending RAIDs?
  • Assuming a power failure, or similar event, that would mess up the file structure (not the drive itself), are there any tools to attempt to recover the data from a ZFS drive? I’ve read some stories that have me worried (e.g. http://mbruning.blogspot.com/2009/12/zfs-data-recovery.html) including how ZFS is not that stable under FreeBSD.
  • Would encryption lower the odds of a successful drive recovery should something go wrong? I’d say so but I’m looking for confirmation.
  • Do you need ECC memory to fully take advantage of ZFS? I know you need plenty.
Thanks again. Cheers.
 
Speaker said:
I’m not planning for any RAID-like setup. I’d rather manually copy the exact same data to the two drives (so they’re exact replicas of one another) and keep them independent instead of pooling them. This way, if one drive fails I’ll just replace it and copy the data from the remaining drive. Wouldn’t this eliminate any potential issues with the matrix/pool failing to be rebuilt and leading to data loss?
No, because you are assuming the other copy is correct. Why not mirror the data and let RAID figure it out? In case of a drive failure the data will still be accessable. Replacing the drive can be done on a running system (if you have hotswappable disks). The main point of RAID is that the data is continuesly available.

Also if I accidentally delete something (or it gets corrupted) I could retrieve it from the other drive regardless of what happened afterwards. Am I missing something here
You're missing snapshots. Make regular snapshots and you can retrieve the deleted file from one of those snapshots.

Assuming a power failure, or similar event, that would mess up the file structure (not the drive itself), are there any tools to attempt to recover the data from a ZFS drive?
ZFS actually protects data against situations like this. There's rarely a need for recovery as it's normally done by ZFS itself.

Would encryption lower the odds of a successful drive recovery should something go wrong? I’d say so but I’m looking for confirmation.
Definitly.

Do you need ECC memory to fully take advantage of ZFS? I know you need plenty.
ECC memory isn't needed. It's sole purpose is to make the machine more resilient to hardware failures.
 
Speaker said:
[*]Do you need ECC memory to fully take advantage of ZFS? I know you need plenty.

ECC memory is not a requirement but highly recommended if you're serious about data integrity. ZFS won't be able to detect data corruption that happens in memory.
 
The data is going be always online (powered 24/7), but only rarely accessed
ECC is mostly needed for Database applications, specially where financial transactions are concerned. By "data integrity", the OP must mean "I don't want to loose my files" (whatever they are) and not "prevent transaction losses on robust data flow". ECC is completely unnecessary if this machine is something like a file server.

Invest your money into other areas instead:
  1. A small & cheap power supply which will protect against a) power spikes (the real killer) and b) power outage. You can create a shut-down script which runs and waits 15-20 mins after power goes and shuts the machine down if power does not return.
  2. Make the server as quiet and as low power consuming as possible. Unfortunately, such systems will not work well with ZFS, but considering that there won't be much load on the machine, don't understand why you need ZFS in the first place..
  3. Use an SSD drive instead of traditional spindle-drives if you suspect power outages to be a constatnt issue.
 
SirDice said:
No, because you are assuming the other copy is correct. Why not mirror the data and let RAID figure it out? In case of a drive failure the data will still be accessable. Replacing the drive can be done on a running system (if you have hotswappable disks). The main point of RAID is that the data is continuesly available.

I'm not fully convinced yet, I still think a RAID arrangement is less secure than two independent drives, even though they are in the same case. In the former case, if one of the drive fails, there's one additional weak spot while the matrix if being rebuilt. Could you please explain why is it better to "let RAID figure it out"?


Beeblebrox said:
[*]Make the server as quiet and as low power consuming as possible. Unfortunately, such systems will not work well with ZFS, but considering that there won't be much load on the machine, don't understand why you need ZFS in the first place..

I thought ZFS because it seems better for archiving rarely accessed large files (e.g. protection against silent corruption). Would you recommend something else? The system will double as a mail/surfing machine for my wife and probably a learning/testing freeBSD for myself.

And a separate question, are ZFS recovery tools on par with UFS ones?

Thanks for answering, I did not expect feedback on the same day I posted. :)
 
Speaker said:
I'm not fully convinced yet, I still think a raid arrangement is less secure than two independent drives, even though they are in the same case. In the former case, if one of the drive fails, there's one additional weak spot while the matrix if being rebuilt. Could you please explain why is it better to "let RAID figure it out"?
If you use two independent drives I'm going to assume you copy the data, by hand, every week or so. That means your data is a week old. Even if you do it daily, the data will still be old. RAID does it instantly, without user interaction, and transparently.
 
Back
Top