ZFS 2x3tb as one raidz member

robot468 · Jul 2, 2022

Hello)

Now i have raidz1 of 4x3tb disks. I want to upgrade it to raidz2 6x6tb disks. 3tb disks are good and functional, and i want to use them in new array.
Is there a way to make 2x3tb disks to be 6tb member of raidz2?

man says:

Code:

Virtual devices cannot be nested, so a mirror or raidz virtual device can
     only contain files or disks.  Mirrors of mirrors (or other combinations)
     are not allowed.

So, maybe i can make some software raid0 of two disks to use them as raidz member?

tingo · Jul 2, 2022

What you are suggesting is not recommended - in any way. ZFS is great in many ways, but vdev management isn't very flexible when it comes to the drives you build those vdevs from. Either you grow with same size (or bigger) drives, or you build a new pool of the drives you have surplus after an upgrade.
Also, zfs really wants to have direct control over the hardware (drives) in a pool in order to give you the best warnings about failing hardware. So using another layer (hardware or software raid) between the drives and zfs is really just setting yourself up for "pulling the rug" from under zfs. Not a good idea.

robot468 · Jul 2, 2022

tingo said:
zfs really wants to have direct control over the hardware (drives) in a pool in order to give you the best warnings about failing hardware

Hmm, really? Zfs monitoring SMART? Where i can read about it?
Now i'm using zfs over gpt partitions due my old server could boot only from first 2TB of disk. Using this setup about 10 years and haven't noticed any problems.

UPD: I remembered another reason using GPT - partition lables. raid controller in my Dell r720xd with firmware for direct disc access does not guarantee the order of the disks. Today it is da0, after installing another disk - it may be da1. Slice labels are great dealing with this.

emmex · Jul 9, 2022

robot468 said:
Hmm, really? Zfs monitoring SMART? Where i can read about it?

ZFS does not monitor SMART, it does a checksum of each disk block and checks it when it reads it.
Your idea of recycling the 4x3TB disk seems interesting to me. The new raidz2 pool uses these 4 disks:
2x3TB disk in RAID0
2x3TB disk in RAID0
1x6TB disk
1X6TB disk
with a usable space of 12TB.

This configuration is not recommended, but should work.
Regards
--
Maurizio

gpw928 · Jul 9, 2022

robot468 said:
Is there a way to make 2x3tb disks to be 6tb member of raidz2?

Yes. And I see no problem with doing that.

Any geom(8) class may be presented to zpool-create(8) as a vdev.
Make your two 3 TB disks into a GEOM CONCAT class using gconcat(8).
This will create a special file in /dev/concat/<name>, which you can present to zpool-create(8) as a 6 TB vdev.
It would be wise to consider the controller ports used and take care not to mis-match disk speeds (too much), but you should do that anyway.

There is no way to migrate from where you are now to where you want to go without a full backup and restore.
But you will need to backup your raidz2 system, so just put your backup scheme in place early...

I would also suggest you consider a stripe of three 6 TB mirrors, rather than 6 x 6 TB raidz2.
Mirrors can be expanded when the time comes for a capacity upgrade (exchange and re-silver spindles one at a time).
Though I believe expansion of raidz is coming soon.
Striped mirrors reduce your capacity (18 TB vs. 24 TB raw), and are slightly less resilient than raidz2, but are much faster, especially at writing.

gpw928 · Jul 9, 2022

robot468 said:
Hmm, really? Zfs monitoring SMART? Where i can read about it?

Are you running smartd(8)?

robot468 · Jul 13, 2022

gpw928 said:
Make your two 3 TB disks into a GEOM CONCAT class using gconcat(8).

May be you know, can i boot from zfs pool, located on gconcat or gstripe vdevs? On boot, when kernel is not loaded yet, how can it work with gstriped devices?

Are you running smartd(8)?

yes

gpw928 · Jul 13, 2022

robot468 said:
May be you know, can i boot from zfs pool, located on gconcat or gstripe vdevs? On boot, when kernel is not loaded yet, how can it work with gstriped devices?

On my physical servers, I always use an SSD mirror for boot and root. So I have never tested that.
However, the fundamental principle is that any GEOM class may be used as a ZFS vdev (caveat emptor).
You may have to edit /boot/loader.conf to load the required kernel module at boot with:

Code:

kldload geom_concat # for a concat
kldload geom_stripe # for RAID0

However, having GEOM RAID0 striping and ZFS striping on top of that is a bad idea!
You really want a concat to make a 6 TB "disk" from 2 x 3 TB spindles, and then let ZFS do any required striping across your pool.

It's worth reading the Handbook chapter on GEOM.

robot468 · Jul 13, 2022

gpw928 said:
You may have to edit /boot/loader.conf to load the required kernel module at boot with:

I mean, can gptzfsboot deal with concat/striped devices? Will it be able to see the zfs pool? or i need first to load kernel, load geom_concat/stripe, and only then zfs can import it pools?

gpw928 · Jul 14, 2022

robot468 said:
I mean, can gptzfsboot deal with concat/striped devices? Will it be able to see the zfs pool? or i need first to load kernel, load geom_concat/stripe, and only then zfs can import it pools?

It's worth saying again that you should never use a GEOM stripe under ZFS. Two different, unrelated, striping algorithms will result in entirely undesirable disk head movements. The uppermost software layer, ZFS, needs to be the sole "stripe master", so the lower GEOM layer needs to be a simple concat.

I'm sorry, I don't know the answer to your question regarding gptzfsboot (I always use separate dedicated SSD media for boot and root).

If you have any sort of hypervisor, you can test the theory on a virtual machine, which is also a good way to plan your build.

Perhaps somebody more erudite will comment.

Jose · Jul 14, 2022

robot468 said:
I mean, can gptzfsboot deal with concat/striped devices? Will it be able to see the zfs pool?

Yes and yes. Quoting gptzfsboot(8)

gptzfsboot tries to find all ZFS pools that are composed of BIOS-visible hard disks or partitions on them...The first pool seen during probing is used a sa default boot pool.

robot468 said:
or i need first to load kernel, load geom_concat/stripe, and only then zfs can import it pools?

You would be wise to do as gpw928 says and not mix GEOM striping with ZFS.

robot468 · Jul 14, 2022

gpw928 said:
It's worth saying again that you should never use a GEOM stripe under ZFS. Two different, unrelated, striping algorithms will result in entirely undesirable disk head movements. The uppermost software layer, ZFS, needs to be the sole "stripe master", so the lower GEOM layer needs to be a simple concat.

ok, stripe or concat - It doesn't matter to me.

Jose said:
Yes and yes. Quoting gptzfsboot(8)

Hm, i tried to do this in vmware with gconcat'ted disks - it cannot boot. Even no loader prompt. With zfs pool on two disks without geom - it boots(I mean vmware+gptzfsboot basically works).

gpw928 · Jul 14, 2022

robot468 said:
Hm, i tried to do this in vmware with gconcat'ted disks - it cannot boot. Even no loader prompt. With zfs pool on two disks without geom - it boots(I mean vmware+gptzfsboot basically works).

It might be worth putting the question to the freebsd-geom mailing list (and tell us what they say).

emmex · Jul 14, 2022

robot468 said:
May be you know, can i boot from zfs pool, located on gconcat or gstripe vdevs?

From FreeBSD list
"Also you can't boot from a gconcat volume like you can
from a gmirror volume."

emmex · Jul 14, 2022

What hardware are you using ? It support RAID0 in BIOS ?

robot468 · Jul 14, 2022

emmex said:
What hardware are you using ? It support RAID0 in BIOS ?

Dell r720xd with H710P controller crossflashed to IT mode. With this firmware it does not support any raid, only direct disc access.

mer · Jul 14, 2022

If I recall correctly, GEOM stuff happens after loader and before single user. Hence you can boot from a gmirror and not gconcat, because the BIOS finds one of the mirror as a bootable device, starts booting, loader loads the gmirror module, gmirror module starts doing it's thing to find the others in the mirror, kernel boots and sees the mirror.

My opinions, feel free to agree/disagree:
What is the intended use of the pool? Just a big space for data that can get lost? Any requirements for fast reads/writes/space/data safety?
I would be inclined to leave the existing config alone with the 3TB disks, and figure out something else for the 2 6TB. Mirror if you need data safety and good read performance, stripe if you just need space for data that can get lost.

I understand you wanting to utilize the 3TB disks but my gut is saying trying to do something to make them look like 6TB is going to hurt you in the long term. But your system, your choice.
In order to utilize them I would simply add another smaller device to boot from. That would be the cleanest way to get around the geom and zfs stacking.

BIOS RAID stuff: I personally don't like this. I've seen it cause problems and if something fails, recovery is difficult.

robot468 · Jul 14, 2022

mer said:
My opinions, feel free to agree/disagree:
What is the intended use of the pool? Just a big space for data that can get lost? Any requirements for fast reads/writes/space/data safety?
I would be inclined to leave the existing config alone with the 3TB disks, and figure out something else for the 2 6TB. Mirror if you need data safety and good read performance, stripe if you just need space for data that can get lost.

This is a home server for 10+ camera surveillance, video storage, time machine, documents, etc. My goal is to keep spare disks of only one volume. If I have 2 pools - Nx6tb and Nx3tb, I will need to buy two types of disks in reserve.

mer · Jul 14, 2022

So space and data safety then.
You could just keep 6TB disks for spares. They can be used to replace a 3TB drive, yes you'll only use half of it, but once all the 3TB devices are replaced, hopefully the raidz expansion will be available and then you can grow the pool.

ETA:
I took a look at the link provided earlier on expansion; that is different than what I'm thinking on using 6TB drives to replace a 3TB drive.
If you start with a mirror of 1TB drives giving you a 1TB mirror, then replace one with a 3TB drive, let it resilver, then replace the other with a 3TB drive, let it resilver, you wind up with a 3TB mirror. This works because I've done it on one of my systems.
I think that if you gradually replace all the 3TB drives in the raidz1 with 6TB drives you will wind up with a raidz1 of the 6TB drives, giving you double the space.

robot468 · Jul 14, 2022

mer said:
So space and data safety then.
You could just keep 6TB disks for spares. They can be used to replace a 3TB drive, yes you'll only use half of it, but once all the 3TB devices are replaced, hopefully the raidz expansion will be available and then you can grow the pool.

In the end I want to come up with 8x6Tb raidz2. Good reliability and enough space. One pool is more convenient to administer than two. The 3tb disks are old enough, they will fail and I will gradually replace them all with 6tb. That's the plan)

In your scheme I will never get rid of the raidz1 pool, which I don't really like. Because practice has shown that all sorts of coincidences are possible)

mer · Jul 14, 2022

robot468 said:
In the end I want to come up with 8x6Tb raidz2.

A stated end goal, that's good. Given that, trying to "make a 6TB disk out of 2 3TB disks" is probably not the way you want to go.

I don't know what resources are available to you, but I would be inclined to try and get there in one step. Get the number of 6TB drives you need to achieve that, set it up that way from the beginning, migrate existing data from the raidz1. I would also look at an external enclosure for the current 3TB drives and use it as extra storage for anything that can be lost.

But I recognize it's easy for me to "spend your money", so take it for "that just my opinion."

Jose · Jul 14, 2022

robot468 said:
ok, stripe or concat - It doesn't matter to me.

Hm, i tried to do this in vmware with gconcat'ted disks - it cannot boot. Even no loader prompt. With zfs pool on two disks without geom - it boots(I mean vmware+gptzfsboot basically works).

Do not mix GEOM concat or stripe and ZFS. The latter works on bare disks or partitions only. It will boot fine from a ZFS striped vdev, but will not work with GEOM stripe or concat as you have discovered.

gpw928 · Jul 14, 2022

The GEOM stack sits directly above the kernel device drivers for disks. ZFS is specifically designed to use GEOM classes for vdevs, and you can even stack the classes. There is no question that you can use a concat for a ZFS vdev. You absolutely can. I have tested it. It works. And the design is orthogonal, making it possible to do silly things (like using a GEOM stripe to construct a ZFS stripe).

I argue that a simple concat is an effective way of re-using odd sized disks in a ZFS pool, avoiding waste. This appears to be the exact goal of the OP.

The issue with the OP is whether you can boot form a ZFS pool constructed from a GEOM concat class. I don't know enough about the boot code to be sure of the answer, but I suspect not because that would complicate the early boot code (which needs to read /boot in the root). However, I'm pretty sure that you can boot from a ZFS pool with underlying GEOM class ELI encryption, and that's what makes me so uncertain about what's might be possible. I'm out of my depth there...

I have never needed to solve this dilemma, because, on physical servers, I always separate the O/S from the application data and boot form a pair of SSDs in mirror configuration (a ZFS mirror, these days).

For the OP, maybe consider getting carriers for a pair of cheap SSDs, or investigate booting from the internal dual SD module (VMware can do this on an R720, not sure if FreeBSD can).

robot468 · Jul 14, 2022

gpw928 said:
For the OP, maybe consider getting carriers for a pair of cheap SSDs, or investigate booting from the internal dual SD module (VMware can do this on an R720, not sure if FreeBSD can).

Thanks to you, I found out that I have internal SD card slots. I should read the manual more carefully

Jose · Jul 15, 2022

gpw928 said:
The issue with the OP is whether you can boot form a ZFS pool constructed from a GEOM concat class.

I just realized that he's trying to BIOS boot from a GPT partition. GEOM mirrors are incompatible with GPT partitions. I wonder if gconcat(8) has the same limitation? Perhaps he should try zfsboot(8).