Problem with RAIDZ pool

ogogon · Feb 1, 2024

Colleagues, please tell me how to solve my problem.
I have a server and there are two zfs pools on it: zroot consisting of one disk and raid0 consisting of three disks - gpt/rz0:0, gpt/rz0:1, gpt/rz0:2.
The gpt/rz0:1 disk in the raid0 pool failed and I needed to replace it.
I was afraid, due to an error, to format another disk and connected the new disk to the test machine, and created a partition and label rz0:1 on it. But I didn’t realize that at the same time, the entries “name: 'zroot', hostname: 'test', path: '/dev/gpt/zfs0' remained in zdb.”
After moving the disk to the server, nightmares began - at the boot stage, the server started writing:

Code:

Mounting from zfs:zroot/ROOT/default failed with error 6; retrying for 8 more seconds
Mounting from zfs:zroot/ROOT/default failed with error 6.

I believe that in the zdb of volume rz0:1 I need to change the entries to "name: 'raid0', hostname: 'server.ogogon.org', path: '/dev/gpt/rz0:1".
How to do it?

I would be grateful for your advice,
Ogogon.

SirDice · Feb 2, 2024

ogogon said:
The gpt/rz0:1 disk in the raid0 pool failed and I needed to replace it.

If 1 disk of the RAID0 failed the entire pool is toast. There's nothing to replace here, there is no redundancy. The whole pool would need to be recreated from scratch.

ogogon · Feb 2, 2024

SirDice said:
If 1 disk of the RAID0 failed the entire pool is toast. There's nothing to replace here, there is no redundancy. The whole pool would need to be recreated from scratch.

Perhaps, either I did not explain something in sufficient detail, or you did not quite understand something correctly.
(I apologize, sometimes I can be a little vague - I'm not a native speaker.)
I have redundancy in the RAID0 array. This saves the information in it.

 

root@server:/home/ogogon # zpool status

  pool: raid0

 state: ONLINE

status: One or more devices are configured to use a non-native block size.

    Expect reduced performance.

action: Replace affected devices with devices that support the

    configured block size, or migrate data to a properly configured

    pool.

  scan: resilvered 2.71M in 00:03:21 with 0 errors on Thu Feb  1 20:09:32 2024

config:



    NAME           STATE     READ WRITE CKSUM

    raid0          ONLINE       0     0     0

      raidz1-0     ONLINE       0     0     0

        gpt/rz0:0  ONLINE       0     0     0  block size: 512B configured, 4096B native

        gpt/rz0:1  ONLINE       0     0     0  block size: 512B configured, 4096B native

        gpt/rz0:2  ONLINE       0     0     0  block size: 512B configured, 4096B native



errors: No known data errors

ralphbsz · Feb 2, 2024

The pool you called "raid0" is not actually RAID0 (a technical term for non-redundant RAID layout, created by concatenating or striping multiple disks), it is instead RAID-Z1 (which is similar to RAID5). Picking "raid0" as a name was a pretty confusing choice! So with three disks, you have two disk's worth of capacity, plus one disk being used for redundancy (parity). If the output of your "zpool status" command were indented correctly, it would be easier to read. Here's an example:

Code:

	NAME               STATE     READ WRITE CKSUM
	home               ONLINE       0     0     0
	  mirror-0         ONLINE       0     0     0
	    gpt/hd14_home  ONLINE       0     0     0
	    gpt/hd16_home  ONLINE       0     0     0

Now to the real problem you're having. If I understand what you said correctly, after a disk failed, you added a new one. But adding the "new" disk created problems with your zroot pool (which is not the raid0 pool). This makes me think that the new disk wasn't new at all, but already had a zpool called "zroot" on it. How did you prepare the "new" disk?

May I suggest something: If the "new" disk hasn't been written to yet, do the following: Say the new disk is /dev/ada9. Then start with "dd if=/dev/zero of=/dev/ada9 bs=1m count=1048", to overwrite partition table, GPT header, ZFS metadata, and all that. Then partition the now blank disk with gpart, and create just the partition you need to repair your raid0 pool. That should avoid the problems with it having an old zroot pool on it.

ogogon · Feb 2, 2024

ralphbsz said:
Picking "raid0" as a name was a pretty confusing choice!

Strictly speaking, you are right - the name can give rise to ambiguity. But in this case, zero is not the RAID level, but a numeral meaning the serial number of the array in the server.

ralphbsz said:
How did you prepare the "new" disk?

This is described in my first post on this topic. The disk had some kind of Microsoft partitions that I could not remove. I connected the drive to the test machine and installed FreeBSD with DVD. The installer deleted these partitions. Next, I booted up again and from the command line created a partition with the type "freebsd-zfs" and the label "rz0:1". Now I understand that a lot of different metadata remains on the disk, matching the metadata of the system volume.

As I understand it, each volume has several levels of metadata and at label 0 it contains the correct ones, and at label 2 - those that provoke a conflict.

The problematic volume is now visible as ada3.

 ...

=>        40  5860533088  ada3  GPT  (2.7T)

          40  5860533088     1  rz0:1  (2.7T)

...

 root@server:/home/ogogon # zdb -l /dev/gpt/rz0:1

------------------------------------

LABEL 0

------------------------------------

    version: 5000

    name: 'raid0'

    state: 0

    txg: 37632319

    pool_guid: 5655944024702255832

    errata: 0

    hostid: 1933071147

    hostname: 'server.ogogon.org'

    top_guid: 10752194534671245380

    guid: 16681721180300969840

    vdev_children: 1

    vdev_tree:

        type: 'raidz'

        id: 0

        guid: 10752194534671245380

        nparity: 1

        metaslab_array: 33

        metaslab_shift: 36

        ashift: 9

        asize: 9001763340288

        is_log: 0

        create_txg: 4

        children[0]:

            type: 'disk'

            id: 0

            guid: 5876364306846346058

            path: '/dev/gpt/rz0:0'

            phys_path: 'id1,enc@n3061686369656d31/type@0/slot@3/elmdesc@Slot_02/p1'

            whole_disk: 1

            DTL: 260

            create_txg: 4

        children[1]:

            type: 'disk'

            id: 1

            guid: 16681721180300969840

            path: '/dev/gpt/rz0:1'

            whole_disk: 1

            DTL: 49347

            create_txg: 4

            expansion_time: 1706807119

        children[2]:

            type: 'disk'

            id: 2

            guid: 8525103415062619825

            path: '/dev/gpt/rz0:2'

            phys_path: 'id1,enc@n3061686369656d30/type@0/slot@1/elmdesc@Slot_00/p1'

            whole_disk: 1

            DTL: 1044

            create_txg: 4

    features_for_read:

        com.delphix:hole_birth

        com.delphix:embedded_data

    labels = 0 1 2 3 

root@server:/home/ogogon # zdb -l ada3

failed to unpack label 0

failed to unpack label 1

------------------------------------

LABEL 2

------------------------------------

    version: 5000

    name: 'zroot'

    state: 0

    txg: 173

    pool_guid: 8485018444523237979

    errata: 0

    hostname: 'test'

    top_guid: 6099183361447409116

    guid: 6099183361447409116

    vdev_children: 1

    vdev_tree:

        type: 'disk'

        id: 0

        guid: 6099183361447409116

        path: '/dev/gpt/zfs0'

        whole_disk: 1

        metaslab_array: 64

        metaslab_shift: 34

        ashift: 12

        asize: 2998166618112

        is_log: 0

        create_txg: 4

    features_for_read:

        com.delphix:hole_birth

        com.delphix:embedded_data

    labels = 2 

failed to unpack label 3

root@server:/home/ogogon #

ralphbsz · Feb 3, 2024

ogogon said:
The disk had some kind of Microsoft partitions that I could not remove.

Strange; the dd technique to overwrite it with zeroes should have worked. It can give trouble with the BIOS, if the BIOS sees the backup copy of the GPT (which is stored at the very end of the disk), and then helpfully offers to restore the GPT.

I connected the drive to the test machine and installed FreeBSD with DVD.

And this is why the new disk had a copy of the zroot pool. In Texas, they call this a "foot-shaped gun". What must have happened: when you redid the partitioning, it didn't overwrite the content of the partition, and ZFS (correctly but unhelpfully) found a zpool in there.

Now I understand that a lot of different metadata remains on the disk, matching the metadata of the system volume.

Exactly! Fortunately, you got it back. Clearing the disk first would have fixed that problem too.

ogogon · Feb 3, 2024

ralphbsz said:
In Texas, they call this a "foot-shaped gun".

Oh, that mysterious Texas soul!

ralphbsz said:
Exactly! Fortunately, you got it back. Clearing the disk first would have fixed that problem too.

Do I understand correctly that if I manage to remove or change this LABEL 2, which is obviously not used, then all my problems will end? Alas, I don't know how to do this.
To be honest, I really don’t want to remove this disk from the array, format it again and then resilver it again.

ralphbsz · Feb 3, 2024

I don't understand what you mean by "label 2".

A normal disk has a partition table. Today, that is nearly always in GPT format. It is at the very beginning of the disk, with a second backup copy at the end of the disk. It says: "This whole disk drive /dev/ada0 has partitions /dev/ada0p1, /dev/ada0p2 ...". The GPT may have text names for the partitions, in which case the partitions will also be visible as /dev/gpt/foo, /dev/gpt/bar and so on. The GPT is a few dozen KB in size. Disks often have a few small partitions at the beginning, for example for boot information. Those tend to be a few MB.

If you use a complex file system like ZFS, the partition will inside also have metadata. It says something like "Hi I am a vdev called Adam, and I'm part of a zpool called Bob, and the other vdev are called Charlie and David". This ZFS-internal metadata is typically stored right at the beginning of the vdev, or inside the partition at the beginning.

To clear out (modify) the partition table, you use gpart. But gpart does not modify the content of each partition. And using it can be error prone, people (like me!) regularly forget to delete old partitions, screw up the partition numbering, and so on. That's why I recommend always zeroing the beginning of the disk, to wipe away the old partition table completely, and also remove any internal metadata that's stored at the beginning of the first big partition. And to make pretty sure that I get rid of the small partitions, I like to overwrite the first roughly GB of the disk; that only takes a few seconds. If one wants to be extra thorough, one can overwrite each partition with a command like "dd if=/dev/zero of=/dev/adaXpY bs=1m count=1024" (for all values of Y of the chosen disk X), but that only makes a difference if there are multiple big partitions.

So, now what do you mean by "label 2"? The internal metadata of the ZFS vdev in the partition? If yes, the dd command I gave above with "of=/dev/adaX" would remove it. In your case, that's no longer necessary if the system is working now, since ZFS has already overwritten the old vdev with the new one.

ogogon · Feb 3, 2024

ralphbsz said:
I don't understand what you mean by "label 2".
...
So, now what do you mean by "label 2"? The internal metadata of the ZFS vdev in the partition? If yes, the dd command I gave above with "of=/dev/adaX" would remove it. In your case, that's no longer necessary if the system is working now, since ZFS has already overwritten the old vdev with the new one.

Please take a look at my post #5 in this thread. (It contains the commands output, they are on a gray background.)
I have established that my volume gpt/rz0:1 has the secular name of /dev/ada3.
If I view the gpt/rz0:1 device with the zdb command, I receive information under the heading "LABEL 0" and it is correct, if ada3, then I get heading "LABEL 2" and this is information leading to the conflict. At the same time, this is the same physical device. Obviously there are several sets of metadata and they relate to one device. (I don't know where they are, but I believe they are on the disk itself.)
If I have interpreted something incorrectly and drawn the wrong conclusions, please correct me.

ogogon · Feb 3, 2024

Strange, I unloaded a certain amount of information from the beginning of the disk using the commands:

 dd if=/dev/ada3 of=/tmp/ada3.dump bs=1m count=100

dd if=/dev/ada3p1 of=/tmp/ada3p1.dump bs=1m count=100

In both dumps I found information that correct corresponds to the current situation. It's about pool "raid0", not about pool "zroot".

Where does this villain zfs store the old information that zdb gives out as LABEL2 and which prevents the server from loading without incident?

Problem with RAIDZ pool

Administrator