ZFS ZFS basic questions

Hi Everyone,

I've started to use ZFS instead of the venerable UFS and would like to clarify some points about vdevs and zpools. My understanding is that the primary purpose of vdevs is to allow the combination of several disks to be used as a single virtual disk, and the primary purpose of zpools is to allow a filesystem to be extended with new vdevs. A zpool is a collection of vdevs, and a vdev is a collection of disks.

Disks can comprise a vdev according to the following vdev types:
  • striped: blocks of a file may be distributed across all disks with no redundancy
  • mirrored: each block of a file is present on each disk
  • raidz: like striped, but one disk is reserved for parity
  • raidz2: like striped, but two disks are reserved for parity
In contrast, the zpools are always striped. So blocks of a file may be distributed across all vdevs in the zpool with no redundancy.

I think, and I'm looking for confirmation here, that a vdev must comprise of disks of identical size, and a zpool must comprise of identical vdevs. Is that right?

For example, I may create a vdev out of four 1TB disks using raidz2, resulting in a total of 3TB storage. I then create a zpool on top of this vdev. If later I want to add new disks to this zpool, I can't extend the vdev, since those stay fixed after they are created. Instead, I have to create another vdev, but that also must hold four 1TB disks using raidz2. Once this new vdev is added to the zpool, the total storage space will be 6TB.

Is this correct? Or is there a way to add disks of different models/sizes to zpools or even vdevs?
 
your 4x raidz2 will be 2TB not 3
i think you can add anything to the pool like single disks, mirrors whatever if you are happy with the resulting redundancy level
anyway you can test it with files or mdconfig (create test devices with mdconfig and create pools out of them)
 
Welcome to the ZFS world!

Some of your questions:
"raidz2: like striped, but two disks are reserved for parity"
To be precise: it's the space of two disks used for parity; the parity information is distributed over the disks in the vdev. RAIDZ1 is not the same as RAID-5 but you can say it is RAID-5-like.

"I may create a vdev out of four 1TB disks using raidz2, resulting in a total of 3TB storage."
Using raidz2 with four 1 TB disks gets you a total of 4TB of disk space; 2 TB used for parity and 2 TB can be used for your data.

You can use different size disks in one vdev but only the smallest sized disk worth of space is used of each disk.
You can use different sized vdevs combined in a single pool but then you will have an imbalance in storage and (evenly distributed) performance.

"So blocks of a file may be distributed across all vdevs in the zpool with no redundancy."
You get redundency (with mirrors or RAIDz) from the vdevs in the pool, not from the pool as such.

Some links I'd like to suggest, especially from a beginners user perspective:

Ad #2: The article as a whole has a certain angle. Some comments, especially on expansions: ZFS in the trenches | BSD Now 123 (as the writer mentions at Addressing some feedback).
 
My goto answer is:
Pick up a copy of FreeBSD Mastery: ZFS and FreeBSD Mastery: Advanced ZFS by Michael W Lucas and Allan Jude. For me, easiest to understand books about ZFS I've read.

Below is all my understanding and may not be 100% correct
Identical sizes: not really. Ideally you want them to be, but ZFS "expands to the smallest". If you create a mirror with a 1TB and a 2TB disks, ZFS says it's "a mirror sized 1TB".
This is an old article, but still makes sense to me:

Adding to a zpool: I think you can always add new disks to a striped pool but that increases the size, no redundancy and breaks if you remove a disk. This is a typical mistake when folks have a single disk that they are trying to mirror, so read the difference between add and attach.
I think the add and attach also wind up affecting adding disks/vdevs to other redundancy types.
I just looked up in my references, they all say "You can't add to a RAID-Z VDEV". That means you can't add a disk to your RAIDZ2 zpool.
I think you can create a mirror, but you need enough resources to create the same RAIDZ and attach that to the pool. That would give you stuff like a mirror of 2 RAIDZ-2 vdevs.

One thing about the sizes of the devices: ZFS will expand.
Say you have a mirror with 2 500GB drives. You replace one drive with a 1TB drive, ZFS will resilver, then you replace the other 500GB drive with 1TB drive, ZFS resilvers. Now you have a mirror of 1TB drives.
 
All the "disks" in a VDEV have to be the same size. If they are not exactly the same size, you can force the use of the smallest common denominator (and waste the extra space).

You cannot add more "disks" to a VDEV once it is created (except you can convert a single "disk" to a mirror).

The only way to expand VDEV is to replace all the "disks" with larger ones (one at a time, because you have to resilver).

A zpool is composed of one or more VDEVS. I'm not certain here, but I don't believe that the VDEVs have to be the same size, or construction (but they should be, so silly things are possible). You can add more VDEVs to a zpool after it is created, but the resultant striping will be sub-optimal.

ZFS can use for storage media anything that is a FreeBSD GEOM storage provider, which is what I really meant by "disk" above. So you may compose your VDEVs from:
  • raw disk;
  • a disk partition;
  • file-backed storage;
  • GEOM provider, e.g. GELI, concat, mirror, multipath, memory disk, ... see geom(8).
Not all GEOM classes may make sense in context of ZFS, but some are very handy, e.g.
  • GELI has traditionally been used for encryption (ZFS now also have native encryption);
  • as covacat mentioned, mdconfig(8) is really handy for quickly crafting small test VDEVs;
  • gconcat(8) can be used to aggregate small disks into a single, larger, "disk".
GEOM classes can be stacked on top of each other.
 
All the "disks" in a VDEV have to be the same size. If they are not exactly the same size, you can force the use of the smallest common denominator (and waste the extra space).

You cannot add more "disks" to a VDEV once it is created (except you can convert a single "disk" to a mirror).
I think the "use smallest" is the default behavior. At least from my experience.
Mirrors: one can add providers to a mirror. Take a mirror of 2 providers, you can attach a third provider. Why? well after resilvering is complete, you have the data stored across 3 physical devices. "So what?" Well, now detach one and you have an instant backup and can move that to a different machine.
Or if the 3rd device is bigger, you can detach one of the original 2 smaller, attach a new bigger one. Now after resilvering you have 3-way mirror, 2 big devices one small. Now remove the small one (detach) and you are back to a mirror of 2 devices but now they are bigger.
Upgrade of the devices to larger sizes without losing redundancy. I've done that exact thing a couple times. I wish I had the right hardware for all the devices to be in trays that can be hot swapped, would have been a snap.
 
I think the "use smallest" is the default behavior. At least from my experience.
As my 8 year old 3 TB spindles failed, I have been replacing them with 4 TB spindles.

I have just remade the tank (5 spindles @ raidz1 to 7 spindles @ raidz2).

The new pool creation failed specifically because the disk sizes were mis-matched. I had to force it.
 
  • Like
Reactions: mer
The new pool creation failed specifically because the disk sizes were mis-matched. I had to force it.
Interesting. Did you try going from the raidz1 to raidz2 as part of it? I was basically upgrading a mirror and leaving it a mirror so did not change the configuration of the zpool.
What version of OS did you do this on? Mine was done on pre-FreeBSD12 so if you were going across OS versions maybe that had corner cases.
Anyway, "my personal experience" trumps all theory.
 
One thing about the sizes of the devices: ZFS will expand.
Say you have a mirror with 2 500GB drives. You replace one drive with a 1TB drive, ZFS will resilver, then you replace the other 500GB drive with 1TB drive, ZFS resilvers. Now you have a mirror of 1TB drives.
Additionally, if you have mixed drives, say 2x1TB and 2x2TB, you can make zpool of 4x1TB using all four drives, eg raidz of 3TB, and create a mirror of 2x1TB in the "wasted" space on the 2x2TB drives giving redundancy across 4TB of usable space rather than only 3TB provided by 2x1TB mirrored and 2x2TB mirrored. Some may see that as risky, others as being frugal and maximising the use of resources :)
 
Dave01's suggestion is excellent, and it is exactly what I do at home: I have a pair of drives, 3TB and 4TB, and I use that for one 3TB mirror pair, plus one 1TB separate pool that's non-redundant (for scratch storage of backups).

My only warning about such setups is that you need to be very careful. For example, in dave01's example, you have six 1TB partitions. When setting those up, you need to make 100% sure that you assign the partitions to pools correctly. Creating a mirror pair out of two partitions that are on the same disk will be really bad: not only will it be unsafe from a durability point of view, it will also have lousy performance. Once again, I would recommend using gpart to name the partitions with human-readable names, and then using those names.

Why do I mention that? My colleagues and me did exactly that once at a customer site. On a $100M computer, systematically for thousands of disks. We got in BIG trouble.
 
Interesting. Did you try going from the raidz1 to raidz2 as part of it? I was basically upgrading a mirror and leaving it a mirror so did not change the configuration of the zpool.
I purchased two new 12 TB WD Gold SATA disks. They are on product run-out sale (WD are re-branding their high end disks to HGST). I connected them externally as a mirror'd pool on USB (3.1Gen2 StarTech) SATA adapters. That allowed me to send the original internal RAIDZ1 tank to the USB mirror. The system ran for a couple of days like that (with a desk fan pointed at the WD Golds). I then configured an internal RAIDZ2 tank with two additional spindles (7 x 3 TB), and sent the tank back to the new RAIDZ2 pool. The new 12 TB disks have just been redeployed for an off-site backup rotation for snapshots of the tank.

My original plan was to vacate the tank to a pool (single VDEV) consisting of a GEOM concat of three USB attached spindles. But I couldn't resist the common sense suggestion made by VladiBG of getting the 12 TB disks. It allowed the tank to be vacated safely (redundancy preserved at all times), and the ZFS server off-site backup capacity sorted for years to come. It cost some dollars, but well spent, I think. [The ZFS server has on-line backups of everything else, so its off-site backups are critical.]
What version of OS did you do this on?
It's FreeBSD 13.0-RELEASE, on the (standard) delayed patch cycle.
 
  • Like
Reactions: mer
Additionally, if you have mixed drives, say 2x1TB and 2x2TB, you can make zpool of 4x1TB using all four drives, eg raidz of 3TB, and create a mirror of 2x1TB in the "wasted" space on the 2x2TB drives giving redundancy across 4TB of usable space rather than only 3TB provided by 2x1TB mirrored and 2x2TB mirrored. Some may see that as risky, others as being frugal and maximising the use of resources :)
Or a GEOM gconcat(8) of 2x1TB (1+1) spindles, and then combine with 2x2TB (2+2) spindles to provision a VDEV of 6TB total ((1+1)+2+2) = (2+2+2) as a 6TB stripe, a 4TB RAIDZ1, or a 2TB tripple mirror.
 
  • Like
Reactions: mer


Nice.

More discussion of the Ars Technica article: <https://old.reddit.com/r/homelab/comments/o11j53/-/>
 
Back
Top