ZFS Migration from RAIDZ1 to RAIDZ2; is it possible?

LVLouisCyphre · Feb 26, 2020

I've tried some basic searches for this. I haven't found a concrete answer.

What I want to be able to do is do a base install as RAIDZ1 then drop another identical drive in and convert or resilver the array to a RAIDZ2.

Feasible or fool's journey?

If it's feasible; how do you do it? ZFS documentation is a bit sparse from what I've found or point me in the right direction.

Eric A. Borisch · Feb 26, 2020

You should be able to create a sparse file with truncate(1) the same size as your drives, and include it as a device when you initially make a raidz2 pool. Offline the file-backed device immediately after creation, and then replace it with a physical drive when you are ready.

The utilization for small blacks may end up being a little more wasteful on the intentionally-hobbled raidz2 than a native raidz1, as it will be following the raidz2 (allocations must be a multiple of 3 sectors) layout rules, but by how much depends on how wide the pool is.

LVLouisCyphre · Feb 26, 2020

Eric A. Borisch said:
You should be able to create a sparse file with truncate(1) the same size as your drives, and include it as a device when you initially make a raidz2 pool. Offline the file-backed device immediately after creation, and then replace it with a physical drive when you are ready.

The utilization for small blacks may end up being a little more wasteful on the intentionally-hobbled raidz2 than a native raidz1, as it will be following the raidz2 (allocations must be a multiple of 3 sectors) layout rules, but by how much depends on how wide the pool is.

Let me make sure I get this straight. I'm creating a RAIDZ2 with an offline or phantom file that doesn't exist. I'm creating a phantom device just for the purposes of creating a RAIDZ2. The phantom device will be put offline immediately putting the RAIDZ2 in degraded mode but functional. It will function in degraded mode as RAIDZ2.

If I'm following your logic, it makes sense to me. You have to fool ZFS into RAIDZ2 if you only have three drives initially with a phantom fourth device which is the minimum for RAIDZ2.

This is probably not best practice but doable in pinch situations. Once you have the device installed, you resilver immediately giving you a physical RAIDZ2.

This practice can probably be expanded to RAIDZ3. You just have two phantom devices instead of one. Basically we're using ZFS functionality to expand fault tolerance instead of capacity if I'm reading this correctly from someone who understands ZFS better than me.

If I'm not reading this correctly, please correct me. I greatly enjoy these discussions on design an implementation and how to exploit implementation for one's benefit.

Eric A. Borisch · Feb 26, 2020

That’s correct. The pool will happily run in its “degraded” state with the redundancy you would expect to remain after removing a drive.

I won’t say it’s a best practice, but it should work in a pinch.

SirDice · Feb 26, 2020

It's probably the only way to go. As far as I know there's no way to convert an existing RAIDZ1 to Z2 'in-place'. So you would have to backup, destroy pool, recreate pool, restore backup.

LVLouisCyphre · Feb 26, 2020

Thanks gentlemen.

TL;DR warning. However, I figure that commenters should have an idea of what I'm trying to build and the rationale behind it.

What I plan on building is the following;

FreeBSD servers with a quad port M.2 RAIDZ2 boot, root, /usr, /var, swap and ports; four 500 GB M.2 keys which will give me 1 TB. That should be plenty of space for that purpose. Based on my past experience with older FreeBSD versions before I went on hiatus it was. If not, please chime in. I typically give a FreeBSD box double the swap space of the maximum physical memory a box is capable of. My Lenovo TS430 with a max of 32 GB of physical memory gets 64 GB of dedicated swap. HP MS G7s will get 32 GB of swap as they can handle 16 GB of physical memory.
Legacy BIOS machines (HP MS G7s) will be SATA 3 M.2 B+M keys. I'm happy with the performance of this quad port M.2 SATA SSD PCIe card. They work well with by my HP N54Ls and Lenovo TS430.
NVMe eventually on my Lenovo TS430 for testing purposes. I'm on the fence if I'll get better performance of NVMe over SATA SSD as TS430s are PCIe 2.0. The TS430 supports UEFI booting which makes me want to test UEFI NVMe cards.
Data is going to be on 3.5" hard drives; RAIDZ2 for obvious reasons; higher capacity per drive with platters over chips.

Does the FreeBSD 11+ installer support ZFS installations with phantom devices to create RAIDZ2 with just three drives within context of this discussion?

zader · Feb 26, 2020

you may not want to mirror your swap if your going to use raid 1+0 on those 4 drives ..
If you do a raid 1+0 and mirror swap you will eventually end up with 1 allotment per drive .. (probably a little overkill)

the general rule of x2 ram is an old skool thought .. I'm personally old skool and always do it .. but more so only on production machines where if something core dumps I need to make 100% I get the entire dump..

also you may want to exclude /var/tmp /tmp /swap from snapshots ..

sounds like your server is 4x nvme + a pool of 3.5" storage ..

I did up a similar setup and opted to install directly to the root pool and built out a storage pool separately .. this depends on you I guess.. in my case Im very comfy with tolls like rotating snapshots, beadm and building packets .. so I have little fear of messing up my main pool .. installing to it has some advantages ..

if your not gonna jedi master the server .. I would install a basic os to a normal ssd ..
then build your pool of nvme
then build your pool of 3.5"

you would in this case end up with a striped zroot for the os and your 2 other pools... In the event you destory your os. .. all you would need to do is wipe the single drive.. reinstall and then zfs import the 2 pools back and your up and running.

as for pool configuration .. thats an important consideration..

depending on your workload zraid2 or striped mirrors have pros/cons .. like if you want performance go 1+0 .. if you need to continually add volumes raid 1+0 .. cons = 1/2 the space ..
if you have all of the drives now, zraid2 .. if you need safety zraid3. also if your workload is a bunch or random reads/writes ie (vms) .. use a 1+0 over a zraid ...
so without knowing the workload ..

sounds like you want a 1+0 with the nvme and zraid2/3 for the storage pool.

SirDice · Feb 26, 2020

To be honest I don't see the benefit of using M.2/NVMe/SSD for the OS. Not on the typical server hardware at least. Sure the system boots faster but most server hardware will spend a lot more time in POST (initializing option ROMs, network cards, SAS controllers, etc). So in the end you're shaving off a couple of seconds on something that takes a few minutes. And lets be honest, how often do you reboot a server? Once the system is booted, and the machine has plenty of RAM, most things you run will eventually come from the filesystem or process cache that's in RAM. Again, you're not really befitting from the faster storage.

For data storage, in particular databases, M.2/NVMe/SSD will definitely improve performance. Significantly.

zader · Feb 26, 2020

yeah exactly .. enfact the only reason I recommend an ssd for the boot os is usb keys and sd cards tend to get trashed more often than just having a normal drive for the os..

mefizto · Feb 26, 2020

Hi zader,

this may be little off-topic, but since SirDice sort of started, I have been running a server like you suggest - separating OS pool installation form a data pool installation, from a usb key for over 10 years. I understand that a sample of one has no meaning, but perhaps the succes is due to the fact that the size is 16 GB, and all the OS takes only 3.5 GB?

However, what I wanted to ask, which is on topic, would you recommend to mirror the OS drive?

Kindest regards,

M

LVLouisCyphre · Feb 26, 2020

zader said:
the general rule of x2 ram is an old skool thought .. I'm personally old skool and always do it .. but more so only on production machines where if something core dumps I need to make 100% I get the entire dump..

I'm very old skool. At the risk of carbon dating myself, I cut my teeth in the business with Suns when the SPARCstation 1 made its debut in 1989. I remember the 10BASE5, pizza and lunch box case days quite well.

zader said:
sounds like your server is 4x nvme + a pool of 3.5" storage ..

It's going to be a RAIDZ2 M.2 SATA 3 SSD with a pool of 3.5" RAIDZ2. I can't feasibly do NVMe with legacy BIOS booting on my HP MS G7 N54Ls unless the NVMe card has a legacy option ROM. I don't know if such a unicorn exists. I think everything PCIe 3.x and later supports UEFI booting.

zader said:
if your not gonna jedi master the server .. I would install a basic os to a normal ssd ..

One of the goals of the project is to sharpen up my atrophied BSD skills and Jedi master the server garden and learn ZFS. One of my HP MS G7 N54Ls is going to be for testing purposes and Jedi mastering. It will be nuked and reinstalled regularly.

zader said:
depending on your workload zraid2 or striped mirrors have pros/cons .. like if you want performance go 1+0 .. if you need to continually add volumes raid 1+0 .. cons = 1/2 the space ..
if you have all of the drives now, zraid2 .. if you need safety zraid3. also if your workload is a bunch or random reads/writes ie (vms) .. use a 1+0 over a zraid ...
so without knowing the workload ..

sounds like you want a 1+0 with the nvme and zraid2/3 for the storage pool.

Either option would work. The goal is to have a fault tolerant FreeBSD boot drive which is why RAIDZ2 on a four port SATA 3 M.2 B+M key. When you're dealing with precious real estate within a microserver form factor M.2 becomes very attractive. If a M.2 fails, you have to down the server, remove the card and swap the failed M.2 key. If I'm running RAIDZ2, I don't have to worry about immediate hardware replacement running the boot drive in RAIDZ2 degraded mode. I can just order a replacement from Amazon or eBay. Of course, I'd check the S.M.A.R.T. drive statistics and see if another M.2 key is pending failure. It might be prudent to order a pair.

All machines are going to be running jails and VMs. I design my machines to be able to run the dog washer and kitchen sink ports. I'd rather overkill and build it once than have to do it again.

SirDice said:
To be honest I don't see the benefit of using M.2/NVMe/SSD for the OS. Not on the typical server hardware at least. Sure the system boots faster but most server hardware will spend a lot more time in POST (initializing option ROMs, network cards, SAS controllers, etc). So in the end you're shaving off a couple of seconds on something that takes a few minutes. And lets be honest, how often do you reboot a server?

Ideally very rarely. However, one of my N54Ls is going to be for testing and what not as well as a hardware backup for another one. Fastbooting with SSDs is a nice feature to have in that case.

SirDice said:
Once the system is booted, and the machine has plenty of RAM, most things you run will eventually come from the filesystem or process cache that's in RAM. Again, you're not really befitting from the faster storage.

Real estate with the microserver form factor. If you need your boot drive(s) to fit on a PCIe card then M.2 is the answer. If you're doing legacy BIOS booting then you're further limited to SATA 3 M.2 SSDs unless you can find an NVMe card that supports non-bifurcating motherboards with a legacy BIOS option ROM for booting. I doubt such a unicorn exists. NVMe doesn't start to shine until you go to PCIe 3.x. All of my systems are PCIe 2.0.

SirDice said:
For data storage, in particular databases, M.2/NVMe/SSD will definitely improve performance. Significantly.

zader said:
yeah exactly .. enfact the only reason I recommend an ssd for the boot os is usb keys and sd cards tend to get trashed more often than just having a normal drive for the os..

Yes plus you also have to deal with the clumsy factor where you can accidentally break a USB key and the USB slot in the process plus the HP N54L is only USB 2.0.

zader · Feb 26, 2020

mefizto said:
However, what I wanted to ask, which is on topic, would you recommend to mirror the OS drive?

no need to mirror the os, what I meant was ..

use 1 standard ssd for the os / boot... (any single vdev with a pool is considered a stripe) ..
then use your 4 nvmes as a second pool. 1+0 for perfomance
and finally the last pool for your spinning rust..

this way it's basically a throw away system .. all your data is still safe on pool 2 and 3 .. if you ever need to wipe it.. just reinstall on the single ssd and import the other two pools.

All machines are going to be running jails and VMs. I design my machines to be able to run the dog washer and kitchen sink ports. I'd rather overkill and build it once than have to do it again.

this 3 pool system would be perfect for that and keep separation between your vm data and host. set up iocage or what ever you like and then nullfs mount what ever you like form the rust and your gtg.

mefizto · Feb 26, 2020

Hi zader,

what I have been doing is synchronizing the OS usb key with a spare usb key after each major upgrade/change, so if a catastrophic problem with the OS happens, I can just throw away the original usb key and replace it with the back up.

Kindest regards,
M

ZFS Migration from RAIDZ1 to RAIDZ2; is it possible?

LVLouisCyphre

Eric A. Borisch

LVLouisCyphre

Eric A. Borisch

SirDice

Administrator

LVLouisCyphre

zader

SirDice

Administrator

zader

mefizto

LVLouisCyphre

zader

mefizto