Unsure about ZFS - help me drink the CoolAid

For years I've been hearing things like "You might not know it yet but you really want ZFS". I finally set-up a test system with it and I'm having trouble understanding what the hoopla is about. It's great that it can handle huge amounts of storage but that's not a good enough reason to go with it as far as I'm concerned. The self healing stuff is great too but from a practical perspective, I've never really had a corruption issue with UFS. I like the snapshot functionality though. I'm going to be deploying a new 9.1 production system with a real RAID controller soon and I'm trying to figure out if ZFS is the way to go. Please help me decide.

The system I'm talking about is going to be running DBMail/Postfix with MySQL backend + Apache/PHP. It won't be getting a whole lot of traffic but it needs to be bulletproof.

Thank you!
 
I still use UFS (and gmirror) for smaller systems but here are some of the reasons I'm a big ZFS fan:

1) For a start, now I've been using ZFS for a while, I wouldn't want a file system of 2+TB on anything else. I have a 4TB VM host running on EXT3 for historic reasons and if we fsck the virtual machines, they usually find errors. I know this is because data has been corrupted at some point and it's just that nothing has noticed. With ZFS, it'll fix any silent errors (or just fail the device). It's a nice feeling to know that if the pool is online, every single bit is correct. Corruption does happen on large file systems, and you generally won't know about it until it causes you real problems.

2) Simplicity. I never really got the hang of mbr/fdisk/bsdlabel/dump/restore. Even doing a dump and then an immediate test restore would give me warnings and leave me with a non-working system. GPT has made handling disks/partitions a lot easier and with ZFS, I can fully restore a backup with one simple zfs send/recv command.

3) Snapshots/Backup. Snapshots by themselves are nice but not life changing, however, the ability to snapshot and send incremental streams of those to a backup server is. With simple commands I can backup a server which would take 8 hours with rsync in minutes. Not only that, I automatically get versioned backups on the backup server so I can restore the latest copy of a file, or one from 2 weeks ago. As it's a direct copy of the entire file system, it makes backing up a full machine easy.

4) Flexibility - Being able to create new file systems for different purposes on the fly and not have to allocate disk space. I can create file systems for different purposes when needed then choose different backup/snapshot schedules, compression settings, quota etc.

If you do go with ZFS I would suggest following the 'ZFS madness' guide in the Howto section for setup instructions (If you're using root-on-ZFS). With ZFS it's just messy to have the root file system in the root of the pool and the pool/ROOT/some_label layout ties in with the beadm command. Depending on the number of disks you may prefer a separate mirror (ZFS or gmirror) for the root fs.

Also the LSI cards seem to be the best cards at the moment as most have an LSI & built-in driver available (I'm not sure which is the best atm) and can be flashed with IT firmware which removes RAID functions, making them a simple HBA. You can also just use a mainboard that has enough on-board SATA ports (ideally with AHCI support which will allow hot-swap).
 
usdmatt said:
I still use UFS (and gmirror) for smaller systems but here are some of the reasons I'm a big ZFS fan:

1) For a start, now I've been using ZFS for a while, I wouldn't want a file system of 2+TB on anything else. I have a 4TB VM host running on EXT3 for historic reasons and if we fsck the virtual machines, they usually find errors. I know this is because data has been corrupted at some point and it's just that nothing has noticed. With ZFS, it'll fix any silent errors (or just fail the device). It's a nice feeling to know that if the pool is online, every single bit is correct. Corruption does happen on large file systems, and you generally won't know about it until it causes you real problems.

2) Simplicity. I never really got the hang of mbr/fdisk/bsdlabel/dump/restore. Even doing a dump and then an immediate test restore would give me warnings and leave me with a non-working system. GPT has made handling disks/partitions a lot easier and with ZFS, I can fully restore a backup with one simple zfs send/recv command.

3) Snapshots/Backup. Snapshots by themselves are nice but not life changing, however, the ability to snapshot and send incremental streams of those to a backup server is. With simple commands I can backup a server which would take 8 hours with rsync in minutes. Not only that, I automatically get versioned backups on the backup server so I can restore the latest copy of a file, or one from 2 weeks ago. As it's a direct copy of the entire file system, it makes backing up a full machine easy.

4) Flexibility - Being able to create new file systems for different purposes on the fly and not have to allocate disk space. I can create file systems for different purposes when needed then choose different backup/snapshot schedules, compression settings, quota etc.

If you do go with ZFS I would suggest following the 'ZFS madness' guide in the Howto section for setup instructions (If you're using root-on-ZFS). With ZFS it's just messy to have the root file system in the root of the pool and the pool/ROOT/some_label layout ties in with the beadm command. Depending on the number of disks you may prefer a separate mirror (ZFS or gmirror) for the root fs.

Also the LSI cards seem to be the best cards at the moment as most have an LSI & built-in driver available (I'm not sure which is the best atm) and can be flashed with IT firmware which removes RAID functions, making them a simple HBA. You can also just use a mainboard that has enough on-board SATA ports (ideally with AHCI support which will allow hot-swap).

Thank you! This is great info. The system I'm deploying will have 4x300GB drives in RAID1+0 config (under 600GB usable disk space). It's an HP server with a SmartArray RAID controller. Does it make sense to use ZFS in this case?


Also,I have a "nuts & bolts" type question. I used this guide to set-up a test VM with ZFS as root fs:
http://www.aisecure.net/2011/11/28/root-zfs-freebsd9/
Currently I have 3 GPT disks mirroring each other:
Code:
  pool: zroot
 state: ONLINE
  scan: resilvered 6.01G in 0h17m with 0 errors on Wed Feb 27 16:37:01 2013
config:

        NAME           STATE     READ WRITE CKSUM
        zroot          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk2  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0

What happens if I detach one of the disks, let's say gpt/disk2? Does the data on it stay intact? Would I be able to still boot from that disk if the other two died?
 
ph0enix said:
Currently I have 3 GPT disks mirroring each other:
Code:
  pool: zroot
 state: ONLINE
  scan: resilvered 6.01G in 0h17m with 0 errors on Wed Feb 27 16:37:01 2013
config:

        NAME           STATE     READ WRITE CKSUM
        zroot          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk2  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0

What happens if I detach one of the disks, let's say gpt/disk2? Does the data on it stay intact? Would I be able to still boot from that disk if the other two died?

Short answer: Yes.
The detached disk would be an exact copy of the other two at that point in time. This is actually an acceptable backup strategy; Attach the backup disk to make it sync up, then detach it again for safe-keeping.
 
Personally, I would run ZFS on the HP server, if only for the ability to easily mount the pool by plugging the disks into a spare system if the HP failed. With that amount of raw space 2 mirrors, raidz or raidz2 would all be fairly acceptable (with backups of any important data of course). ZFS has to read the metadata for all the data during a rebuild (to determine which data actually resides on the replaced disk) so rebuild time is proportional to the amount of data on the pool. As with pretty much all raid systems, around 2TB is considered the max for raidz (raid5) before the risk of additional disk failure during rebuild is too high.

Yes, pulling one of the disks out of that 3 way mirror will leave the data intact. The source system will just have a degraded pool with disk2 missing. If you boot another system off disk2, it will also show a degraded pool, with disk{0,1} missing. (I'm not 100% certain what state disk2 would be in if you actually 'detached' it from the pool.) Obviously it'll only boot if you made sure to add a boot partition to all 3 disks and install /boot/gptzfsboot bootcode.

There is also the zpool split command, which takes one disk out of each vdev in a raid1 or raid10 pool, and splits them out into a new pool. The upside is you get 2 completely separate, fully online pools with the same data, downside is that it won't boot without a bit of messing about as the /boot/zpool.cache file on the new pool will contain the info for the old pool. I believe recent HEAD versions of FreeBSD can boot without the cache file (and without the vfs.root.mountfrom setting).

In a real setting, you may be better off having a 2 way mirror and a separate single disk pool as a backup rather than a 3 way mirror. Unless you have heavy writes, a daily or even hourly zfs send from the mirror to the single disk shouldn't take long and the backup can be exported/imported without affecting the live pool. It also means a serious failure of the live pool would still leave the backup intact instead of taking all 3 disks down.
 
Savagedlight said:
Short answer: Yes.
The detached disk would be an exact copy of the other two at that point in time. This is actually an acceptable backup strategy; Attach the backup disk to make it sync up, then detach it again for safe-keeping.

That's exactly what I was hoping to do - attach a disk, let it resilver, then detach so I have a backup.
 
Technically, it's a copy, not a backup; there's no history. Unless you have a lot of spare disks...
 
wblock@ said:
Technically, it's a copy, not a backup; there's no history. Unless you have a lot of spare disks...

It's still backup, but it's not a good backup regime if you only have backup from one point in time, or only one backup media. Which media you chose to backup to is irrelevant to this point. :) Besides, snapshots should help provide some history.

As for the kind of backup outlined above, I'd recommend cycling between a minimum of three HDD's, preferably keeping one off-site at any given time.
 
IMHO the really killer stuff about ZFS that isn't corruption-resistance related:

  • Snapshots. Check out beadm which leverages ZFS for system level rollback, even on physical hardware.
  • Reduced rebuild time (only rebuilds the used portion of the drive(s) rather than every sector.
  • Compression
  • Ease of creating new file systems, optimized for different requirements.
  • Ability to easily replace disks and upgrade capacity without needing to backup and restore.
  • Ability to split the machine into different file systems, without needing to define hard size limits - how often have you had a problem like "Oh, bugger, /var is too small!"?
  • RAID hardware independent. Until you've had a hardware RAID controller failure and had to scrounge a compatible spare controller from somewhere you may not appreciate this - the data the controller puts on the disks may not work with another hardware RAID controller. :) I've been in this situation once before - the last production server we had of a particular model had a controller failure and I needed to scrounge through decommissioned hardware for a spare controller - they'd been EOL'd for some time and I wasn't likely to get a spare from anywhere, even if it wasn't on the weekend in a remote location (as it was). It was just lucky I had an old decommissioned machine that had identical, functional hardware in it.
For the above reasons, I'd recommend AGAINST hardware RAID if you have the option of ZFS. If you have a controller, just put it in JBOD mode, or if it's a Dell (some of which won't do this), put each individual drive into it's own RAID0 "set".
 
Back
Top