ZFS [HOWTO] Convert Single disk ZFS-On-Root to Mirror

Farioko · Dec 27, 2014

Hi,

Here is a little tutorial on how to add a disk to your zroot zfs-on-root pool, so that it will become a mirror instead of a single disk (stripe) pool. Had some troubles myself, so I thought let's share how to do this.

Once you have connected the new harddrive, we'll have to create a partition table that is exactly the same as on the main harddrive.

First let's find the harddrive using:

camcontrol devlist

It will most likely appear as ada1.

Let's check out how the partition table looks like on the old drive:

gpart show ada0

Code:

=>       34  156301421  ada0  GPT  (75G)
         34       1024     1  freebsd-boot  (512K)
       1058    4194304     2  freebsd-swap  (2.0G)
    4195362  152106093     3  freebsd-zfs  (73G)

To clean the existing partition table on the new drive execute:

gpart destroy -F ada1

Create the partition table on the new drive:

gpart create -s GPT ada1
gpart add -t freebsd-boot -l boot2 -s 512K ada1
gpart add -t freebsd-swap -l swap2 -s 2G ada1
gpart add -t freebsd-zfs -l zfs2 ada1

Run zdb and get the GUID of the disk in the zroot pool. Copy this GUID, we need it in the next step.

Now attach the new drive to the zroot pool:

zpool attach zroot $guid /dev/gpt/zfs2

Install the bootloader onto the new disk so that when one disk crashes, you'll still be able to boot the system:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

Now you can see the pool resilvering:

zpool status zroot

OPTIONAL - MIRROR SWAP

Make sure you have enough memory to turn off the swap temporarily.
Find the name of the swap device:

swapinfo

Turn off the swap partition:

swapoff /dev/gpt/swap0

Load the geom_mirror kernel module in order to create RAID devices:

kldload geom_mirror
echo 'geom_mirror_load="YES"' >> /boot/loader.conf

Create the mirrored swap:
gmirror label -b prefer -F swap gpt/swap0 gpt/swap2

Modify /etc/fstab so when the system boots up, it mounts the right swap partition:

From:
/dev/gpt/swap0 none swap sw 0 0

To:
/dev/mirror/swap none swap sw 0 0

Activate the swap partition:
service swap start

Ofloo · Dec 27, 2014

Exactly what I was looking for.

It gets better every time I check it

I'd like to add a small note: in my case I was converting an existing disk to the system, before booting I'd say disable swap from fstab and add it back after booting using swapon device, the reason I'm adding this note is, because my original system became ada1 after adding a new disk and my swap mounted on the root files system of the existing disk. Good thing it didn't need the swap file at all, but anyway. But it is maybe a good mental note for people who are moving an existing system to a new disk, that this can happen.

Ofloo · Dec 29, 2014

Code:

# zpool attach zroot 14970465206728248144 /dev/gpt/zfs1
cannot attach /dev/gpt/zfs1 to 14970465206728248144: can only attach to mirrors and top-level disks

Any suggestions ?

Code:

# ls /dev/gpt
gptboot0  gptboot1  swap1  zfs0  zfs1

Ofloo · Dec 29, 2014

Code:

zpool attach zroot /dev/gpt/zfs0 /dev/gpt/zfs1

gave me the following result:

Code:

# zpool status -v
  pool: zroot
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
  continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Dec 29 14:24:55 2014
  21.4G scanned out of 651G at 78.1M/s, 2h17m to go
  21.4G resilvered, 3.29% done
config:

  NAME  STATE  READ WRITE CKSUM
  zroot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  gpt/zfs0  ONLINE  0  0  0
  gpt/zfs1  ONLINE  0  0  0  (resilvering)

errors: No known data errors

bra1n · Dec 30, 2014

Worked perfectly. Many thanks!

Ofloo · Jan 25, 2015

My current pool looks like this:

Code:

NAME  STATE  READ WRITE CKSUM
  zroot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  gpt/zfs1  ONLINE  0  0  0
  gpt/zfs0  ONLINE  0  0  0
  logs
  mirror-1  ONLINE  0  0  0
  gpt/log0  ONLINE  0  0  0
  gpt/log1  ONLINE  0  0  0
  cache
  gpt/cache0  ONLINE  0  0  0
  gpt/cache1  ONLINE  0  0  0

And I was wondering if for example I could use geom_mirror to mirror the cache then add it to the pool. Since for some reason cache can't be mirrored in zfs and if so would it make the system faster or not?

kaN5300 · Jun 9, 2017

Farioko said:
gpart create -s GPT ada1
gpart add -t freebsd-boot -l boot2 -s 512K ada1
gpart add -t freebsd-swap -l swap2 -s 2G ada1
gpart add -t freebsd-zfs -l zfs2 ada1

I think this method seems more elegant:

Code:

gpart backup ada0 | gpart restore -lF ada1

Where ada0 is the source (old) disk and ada1 is a new one.

But this way you have no ability to change labels. More proper way if you want to change labels before restoring is:

1) #gpart backup ada0 > /tmp/ada0.bkp
2) #cp /tmp/ada0.bkp /tmp/ada1.restore
3) #edit /tmp/ada1.restore
4) #gpart restore -lF ada1 < /tmp/ada1.restore

usdmatt · Jun 9, 2017

Nice guide, the only thing I find odd is using zdb to get the GUID of the existing disk. The normal method is just to run zpool attach pool disk1 disk2, where disk1 is the device name that appears in zpool status output.

sko · Jun 28, 2017

Ofloo said:
And I was wondering if for example I could use geom_mirror to mirror the cache then add it to the pool. Since for some reason cache can't be mirrored in zfs and if so would it make the system faster or not?

The cache/L2ARC is automatically striped across all cache-providers. Striping is the fastest configuration, not mirroring.
Mirroring would only provide some fault-tolerance for the cache while sacrificing a lot of performance. ZFS does not rely on the L2ARC and doesn't loose data if a cache provider fails - ZFS will just go to disk and retreive the data from there.

Raqibul Hassan · Nov 19, 2017

Ofloo said:

Code:

# zpool attach zroot 14970465206728248144 /dev/gpt/zfs1
cannot attach /dev/gpt/zfs1 to 14970465206728248144: can only attach to mirrors and top-level disks

Any suggestions ?

Code:

# ls /dev/gpt
gptboot0  gptboot1  swap1  zfs0  zfs1

Hello Ofloo,

Probably you have put the zpool GUID instead of the disk GUID.

jerry507 · Apr 20, 2020

I found this guide helpful so I wanted to add two notes...
First, I also had the "can only attach to mirrors and top-level disks" error that was mentioned above. I resolved this by using the command zpool attach pool disk1 disk2 suggested by usdmatt.

Second, I wanted to elaborate on bootcode. I believe the example given:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

is for the "legacy boot" (MBR???) type.

In my case I am using UEFI so the commands are slightly different. I looked at the FreeBSD wiki entry on Root on ZFS and found the commands I needed under section 1.C.v.a "Create the bootcode partition: for UEFI Boot".

In my case my partition map looked like this:

Code:

=>        40  3907029088  ada0  GPT  (1.8T)
          40      409600     1  efiboot0  (200M)
      409640        1024     2  gptboot0  (512K)
      410664         984        - free -  (492K)
      411648     8388608     3  swap0  (4.0G)
     8800256  3898228736     4  zfs0  (1.8T)
  3907028992         136        - free -  (68K)

=>        40  3907029088  ada1  GPT  (1.8T)
          40      409600     1  efiboot1  (200M)
      409640        1024     2  gptboot1  (512K)
      410664         984        - free -  (492K)
      411648     8388608     3  swap1  (4.0G)
     8800256  3898228736     4  zfs1  (1.8T)
  3907028992         136        - free -  (68K)

The structure on ada0 was created by the FreeBSD installer from the guided root on ZFS option. After replicating this structure on ada1 using -b and -s I needed to install bootcode. What worked for me was first running:

gpart bootcode -p /boot/boot1.efifat -i 1 ada1

to install efi partcode into the efi partition (index 1) and then running:

gpart bootcode -p /boot/gptzfsboot -i 2 ada1

to install zfsboot partcode into the gptboot partition (index 2).

After the resilvering was completed I tested by disconnecting my first hard drive and trying to boot. It seemed to work, the system came up and showed that ada0 was faulty. After reconnecting the system was able to boot again as well.

Hopefully this helps someone else. I thought this guide compiled the info I needed in a better way than the handbook did. The handbook as well as the messages after attaching the 2nd disk both show the MBR

hishnik · Jan 16, 2021

But if I have stripe of 2 disks, and want to mirror this stripe with third big one, what should I do?

Ofloo · Jan 16, 2021

hishnik said:
But if I have stripe of 2 disks, and want to mirror this stripe with third big one, what should I do?

I'm assuming you want to use the 3rd one as a mirror. I don't think that this is possible however you can work around it by creating a new pool with the big disk. Then create 2 vdevs approximatly the size of the 2 striped disks and attach the vdev/zvol as a mirror to each disk. Theoretically this should work.

usdmatt · Jan 16, 2021

if you have a stripe of 2 disks, I believe you should theoretically be able be able to create 2 mirrors by partitioning a disk that is twice as big in half, and attaching each half to one of the disks in the stripe. Not an unreasonable option if you have a spare disk >=twice the size of the two pool disks and want some redundancy.

Code:

root@core1:/storage/bhyve # mdconfig -a -t malloc -s 100m
md0
root@core1:/storage/bhyve # mdconfig -a -t malloc -s 100m
md1
root@core1:/storage/bhyve # mdconfig -a -t malloc -s 300m
md2
root@core1:/storage/bhyve # zpool create test md0 md1
root@core1:/storage/bhyve # zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          md0       ONLINE       0     0     0
          md1       ONLINE       0     0     0

errors: No known data errors
root@core1:/storage/bhyve # gpart create -s gpt md2
md2 created
root@core1:/storage/bhyve # gpart add -s 100m -t freebsd-zfs md2
md2p1 added
root@core1:/storage/bhyve # gpart add -s 100m -t freebsd-zfs md2
md2p2 added
root@core1:/storage/bhyve # gpart show md2
=>    40  614320  md2  GPT  (300M)
      40  204800    1  freebsd-zfs  (100M)
  204840  204800    2  freebsd-zfs  (100M)
  409640  204720       - free -  (100M)

root@core1:/storage/bhyve # zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          md0       ONLINE       0     0     0
          md1       ONLINE       0     0     0

errors: No known data errors
root@core1:/storage/bhyve # zpool attach test md0 md2p1
root@core1:/storage/bhyve # zpool status test
  pool: test
 state: ONLINE
  scan: resilvered 30.5K in 0 days 00:00:00 with 0 errors on Sat Jan 16 15:31:35 2021
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            md0     ONLINE       0     0     0
            md2p1   ONLINE       0     0     0
          md1       ONLINE       0     0     0

errors: No known data errors
root@core1:/storage/bhyve # zpool attach test md1 md2p2
root@core1:/storage/bhyve # zpool status test
  pool: test
 state: ONLINE
  scan: resilvered 51.5K in 0 days 00:00:00 with 0 errors on Sat Jan 16 15:31:47 2021
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            md0     ONLINE       0     0     0
            md2p1   ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            md1     ONLINE       0     0     0
            md2p2   ONLINE       0     0     0

errors: No known data errors
root@core1:/storage/bhyve # zpool destroy test
root@core1:/storage/bhyve # mdconfig -d -u 2
root@core1:/storage/bhyve # mdconfig -d -u 1
root@core1:/storage/bhyve # mdconfig -d -u 0

I tested with the third disk more than big enough here to get a partition table and 2 full 100M partitions. It's possible there could be a problem using a 2TB disk to make 2 x 1TB mirrors if the partition table means you don't actually have enough bytes left to make 2 partitions exactly the same size as the 1TB disks; Although probably not a problem if your pool disks are also partitioned.

hishnik · Jan 16, 2021

usdmatt said:
if you have a stripe of 2 disks, I believe you should theoretically be able be able to create 2 mirrors by partitioning a disk that is twice as big in half, and attaching each half to one of the disks in the stripe. Not an unreasonable option if you have a spare disk >=twice the size of the two pool disks and want some redundancy.
<skipped>
I tested with the third disk more than big enough here to get a partition table and 2 full 100M partitions. It's possible there could be a problem using a 2TB disk to make 2 x 1TB mirrors if the partition table means you don't actually have enough bytes left to make 2 partitions exactly the same size as the 1TB disks; Although probably not a problem if your pool disks are also partitioned.

Thank You for detailed answer. It's a pity, but I need something different. It's GELI encripted root-on-ZFS disks with no partitions and external boot drive. The strange thing here is that I already did this procedure, not with striped set, but with single disk. My laptop disk was out of space, I have set autoexpand=on and added new disk to mirror, then I've removed the old one. Now it's OK. I need the same thing, but with striped set (not with single disk), but you say that it's impossible. I can't understand - WHY?

Ofloo · Jan 17, 2021

Or what you could do is just create 2 partitions on the big disk so you have 2 partitions equal to your smaller disks. And mirror/attach those to the appropriate disks.

hishnik · Jan 17, 2021

Ofloo said:
Or what you could do is just create 2 partitions on the big disk so you have 2 partitions equal to your smaller disks. And mirror/attach those to the appropriate disks.

In this case I will not get more space. Is it possible to remove disk from stripe? Maybe I could add big one to stripe and remove small ones?

Ofloo · Jan 17, 2021

If your intent is to replace the 2 disk stripe with one big disk then you need to recreate the pool, at least in my opinion unless you're able to mirror the 2 disks to the one bigger disk. If you're able to do that then you can let it resilver then remove/detach the smaller disks.

hishnik · Jan 17, 2021

Ofloo said:
If your intent is to replace the 2 disk stripe with one big disk then you need to recreate the pool, at least in my opinion unless you're able to mirror the 2 disks to the one bigger disk. If you're able to do that then you can let it resilver then remove/detach the smaller disks.

I can't create pool with same name as current.
Is it possible to rename pool after send/receive operation?
I'd prefer not to change my vfs.root.mountfrom in loader.conf, it's write protected on hardware level.

Ofloo · Jan 17, 2021

It used to be possible not sure if you can still do it. But I assume you can. I would suggest you first create a pool with zvol/vdevs and just try.

Renaming a ZFS pool -- Prefetch Technologies

hishnik · Jan 17, 2021

Ofloo said:
It used to be possible not sure if you can still do it. But I assume you can. I would suggest you first create a pool with zvol/vdevs and just try.

Renaming a ZFS pool -- Prefetch Technologies

Thanks for Your answer!
I have read this article already, but some critical questions are remain:

1) If I'll make `zpool export mypool` on live system it will die?
2) Should I boot from live CD and do it from external OS?
3) How can I deal with /boot/zfs/zpool.cache?

sko · Jan 18, 2021

hishnik said:
1) If I'll make `zpool export mypool` on live system it will die?
2) Should I boot from live CD and do it from external OS?
3) How can I deal with /boot/zfs/zpool.cache?

1) No, you can't export a pool with actively used datasets. (given that 'mypool' is your current root pool)

2) If you want to put the whole pool on the new disk, just create the new pool on it, snapshot all datasets on the current pool (-r) and zfs send|recv them to the new pool (make sure to keep all properties! ideally, send a replication-stream with -R). Reboot and you're done. If you want to rename the pool again to match the old name you have to reboot to an installer image first.
If you just want to increase space and stick with a striped pool, mirror the provider you want to replace on the new disk and remove the old provider afterwards. If autoexpand is enabled, the pool size should now be increased, otherwise issue a `zpool online <newprovider>` to trigger the resize.

3) if you haven't fiddled with the default installation /boot/ should reside on the zpool. OTOH IIRC the zpool.cache file has been retired a long time ago and is only kept for backwards compatibility.

edit:

I'd prefer not to change my vfs.root.mountfrom in loader.conf, it's write protected on hardware level.

where did that entry come from? it is definitely not needed, so just remove it. As said: /boot/ usually resides on the zfs-pool, so i highly doubt there is any "hardware level write protection" present. If you have fiddled something together to put /boot/ on sowe write-only memory you're on your own here... How are you handling kernel updates if /boot/ is completely locked??

The initial scenario you've planned (mirror 2 vdevs on a single vdev) DOES NOT work. A mirror is always 1:1 copy of a single disk/provider. The usable space of a vdev is always the size of the smallest provider. Holding multiple providers on a single disk is generally a very bad idea and should only be used for testing or migrations where data redundancy is still sufficient even without that disk. It will also considerably decrease the performance of a pool and heavily increase load on that disk, so prepare for a much earlier failure of that disk if you keep the pool running in that configuration.
As with many things on UNIX: there are tons of ways you can shoot yourself in the foot, but that doesn't mean you should do it.

hishnik · Jan 19, 2021

CAUTION!!!
Never do that (like I did)...it leads to KERNEL PANIC and DATA LOSS!!!

~~Thank You. Problem solved easily.~~

sko said:
1) No, you can't export a pool with actively used datasets. (given that 'mypool' is your current root pool)

I've thought the same myself.

sko said:
2) If you want to put the whole pool on the new disk, just create the new pool on it, snapshot all datasets on the current pool (-r) and zfs send|recv them to the new pool (make sure to keep all properties! ideally, send a replication-stream with -R). Reboot and you're done. If you want to rename the pool again to match the old name you have to reboot to an installer image first.
If you just want to increase space and stick with a striped pool, mirror the provider you want to replace on the new disk and remove the old provider afterwards. If autoexpand is enabled, the pool size should now be increased, otherwise issue a `zpool online <newprovider>` to trigger the resize.

Solution were much simpler...

sko said:
3) if you haven't fiddled with the default installation /boot/ should reside on the zpool. OTOH IIRC the zpool.cache file has been retired a long time ago and is only kept for backwards compatibility.

Thank you for information on zpool.cache, I didn't know it.
I have external boot disk (microSD card in SDCard adapter glued together with superGlue, with lock switch broken in "write locked" position, also SuperGlued.

sko said:
edit:
where did that entry come from? it is definitely not needed, so just remove it. As said: /boot/ usually resides on the zfs-pool, so i highly doubt there is any "hardware level write protection" present. If you have fiddled something together to put /boot/ on sowe write-only memory you're on your own here... How are you handling kernel updates if /boot/ is completely locked??

I brake my SDCard-to-microSD adapter and clean microSD from superGlue using sandpaper...sort of compromise between security and (re)useability.

sko said:
The initial scenario you've planned (mirror 2 vdevs on a single vdev) DOES NOT work. A mirror is always 1:1 copy of a single disk/provider. The usable space of a vdev is always the size of the smallest provider. Holding multiple providers on a single disk is generally a very bad idea and should only be used for testing or migrations where data redundancy is still sufficient even without that disk. It will also considerably decrease the performance of a pool and heavily increase load on that disk, so prepare for a much earlier failure of that disk if you keep the pool running in that configuration.
As with many things on UNIX: there are tons of ways you can shoot yourself in the foot, but that doesn't mean you should do it.

It was needed to increase disk space on the pool.
Pool consisted of two striped 250 Gb disks.
Grow planned to be done with 1 Tib disk.

I solve it in the spirit of KISS principle, and that worked like a charm:

Code:

# zpool status
  pool: MYPOOL
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:25:10 with 0 errors on Mon Jan 18 14:09:39 2021
config:

    NAME        STATE     READ WRITE CKSUM
    MYPOOL   ONLINE       0     0     0
      ada0.eli  ONLINE       0     0     0
      ada1.eli  ONLINE       0     0     0

errors: No known data errors

# zpool add MYPOOL /dev/label/mypooldsk1tib.eli

# zpool status
  pool: MYPOOL
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:25:10 with 0 errors on Mon Jan 18 14:09:39 2021
config:

    NAME                      STATE     READ WRITE CKSUM
    MYPOOL                 ONLINE       0     0     0
      ada0.eli                ONLINE       0     0     0
      ada1.eli                ONLINE       0     0     0
      label/mypooldsk1tib.eli  ONLINE       0     0     0

errors: No known data errors

# zpool remove MYPOOL ada1.eli

# zpool status
  pool: MYPOOL
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:25:10 with 0 errors on Mon Jan 18 14:09:39 2021
remove: Removal of vdev 1 copied 224G in 0h28m, completed on Mon Jan 18 14:41:20 2021
    7,04M memory used for removed device mappings
config:

    NAME                      STATE     READ WRITE CKSUM
    MYPOOL                     ONLINE       0     0     0
      ada0.eli                ONLINE       0     0     0
      label/mypooldsk1tib.eli  ONLINE       0     0     0

errors: No known data errors

# zpool remove MYPOOL ada0.eli

# zpool status
  pool: MYPOOL
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:25:10 with 0 errors on Mon Jan 18 14:09:39 2021
remove: Removal of vdev 0 copied 225G in 0h33m, completed on Mon Jan 18 15:15:40 2021
    14,6M memory used for removed device mappings
config:

    NAME                      STATE     READ WRITE CKSUM
    MYPOOL                      ONLINE       0     0     0
      label/mypooldsk1tib.eli  ONLINE       0     0     0

errors: No known data errors

I killed my pool and lost my data.
This comes on boot:

Code:

panic: solaris assert: nvlist_lookup_uint64(configs, ZPOOL_CONFIG_POOL_TXG, &txg) == 0, file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c, line: 5222
cpuid = 11
time = 36
KDB: stack backtrace: ...

hishnik · Jan 19, 2021

I'll return to this question after my business trip ~10 days.
If somebody have any idea how to recover, please say it.