Solved Exactly how a faulted disk be replaced?

scilek · Sep 8, 2024

Long story short, this is what I got from a v14.0 machine (maybe v14.1, I'm not sure):

Code:

ZFS Pool Status:

  pool: tank
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

    NAME                      STATE     READ WRITE CKSUM
    tank                      DEGRADED     0     0     0
      raidz3-0                DEGRADED     0     0     0
        ada0                  ONLINE       0     0     0
        ada1                  ONLINE       0     0     0
        ada2                  ONLINE       0     0     0
        ada3                  ONLINE       0     0     0
        15046335961317935290  FAULTED      0     0     0  was /dev/ada4
        ada4                  ONLINE       0     0     0
        ada5                  ONLINE       0     0     0

errors: No known data errors

The disks are all 960GB Kingston's enterprise SATA SSDs (Link: here) and are supposed to be resilient to faults. But life is strange and things happen.

How should I go about replacing the disk? What is the safest procedure?

SirDice · Sep 8, 2024

Take out the bad drive, replace it with a new one. Then zpool-replace(8) it. Take care when removing and replacing, drive names might move around.

scilek · Sep 8, 2024

Should I do anything before I shut it down?

Argentum · Sep 8, 2024

scilek said:
How should I go about replacing the disk? What is the safest procedure?

It is up to you of-course, but my few cents in this case is not to use full disks in the RAIDZ configuration, but partitions on disc. It has been discussed elsewhere. If the full disk is in use, there is no room for drive controller to do remapping. Better get a slightly bigger disk and make an array of partitions and leave a little free space on disk. We do not know, how big the disks are, but assume you can find a replacement disk on which you can create the partition equal or bigger the other drives in this array.

VladiBG · Sep 9, 2024

Ssd already have spare blocks which are managed by the controller. Some disk are overprovisioned and other just reserved some blocks. That's why there's 480gb disk instead of 512gb or 960gb instead of 1tb 1024gb.

Some HBA and RAID controllers write its meta data on the first and last lba that's why it's good to have partitions for ZFS. It also helps to have identical partition sizes as some disks have reported slightly different sizes.

Solved Exactly how a faulted disk be replaced?

scilek

SirDice

Administrator

scilek

Argentum

VladiBG