ZFS ZFS resilvering keep repeating, can't remove faulted drive and add hot spares

Hi all,

Anyone can advise how this can be solved? Many drives keep indicated resilvering mode after the drive has been replaced

I can't remove the "removed" drive or "faulted drives" after replacing it with new drive, error given: no valid replicas

It is still mounted and file system is still ok, however I can't bring it back to online mode, and it has been degraded since few months ago. I have been worried as I'm not sure how bad the redundancy is, as I cant removed any faulted drive I have replaced, no valid replica, does it mean the resilvering process not completed?

Anyone can advise?

Thanks in advance,
cuteh

Code:
root@nas1:/volumes# zpool status -v
  pool: nas1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Jul 30 13:06:13 2024
    462G scanned out of 252T at 93.0M/s, (scan is slow, no estimated time)
    94.3G resilvered, 0.18% done
config:

        NAME                           STATE     READ WRITE CKSUM
        nas1                        DEGRADED     0     0     0
          raidz2-0                     ONLINE       0     0     0
            c0t50000C0F012FBA84d0      ONLINE       0     0     0
            c0t50000C0F01E8A880d0      ONLINE       0     0     0
            c0t50000C0F01E5F674d0      ONLINE       0     0     0
            c0t50000C0F01E75798d0      ONLINE       0     0     0
            c0t50000C0F01E7F0D0d0      ONLINE       0     0     0
            c0t50000C0F01FE3990d0      ONLINE       0     0     0
            c0t5000C500C9BC2677d0      ONLINE       0     0     0
            c0t5000C500F085DA7Bd0      ONLINE       0     0     0  (resilvering)
            c0t5000C500F085DAAFd0      ONLINE       0     0     0  (resilvering)
          raidz2-1                     ONLINE       0     0     0
            c0t5000C500AED0DBA7d0      ONLINE       0     0     0
            c0t50000C0F012F4684d0      ONLINE       0     0     0
            c0t50000C0F012FBD04d0      ONLINE       0     0     0
            c0t50000C0F01301F6Cd0      ONLINE       0     0     0
            c0t5000C500F085F21Bd0      ONLINE       0     0     0  (resilvering)
            c0t50000C0F01E960C8d0      ONLINE       0     0     0
            c0t5000C500ADE572DBd0      ONLINE       0     0     0
            c0t50000C0F0240250Cd0      ONLINE       0     0     0
            c0t5000C500F085DC37d0      ONLINE       0     0     0  (resilvering)
          raidz2-2                     DEGRADED     0     0     0
            c0t5000C500EC66B26Fd0      ONLINE       0     0     0  (resilvering)
            replacing-1                DEGRADED     0     0    18
              c0t50000C0F012F3D04d0    DEGRADED     0     0     0  too many errors
              c0t5000C500EC66B287d0    ONLINE       0     0     0  (resilvering)
            c0t50000C0F012F7D50d0      ONLINE       0     0     0
            c0t5000C50085CCFC83d0      ONLINE       0     0     0
            replacing-4                DEGRADED     0     0     0
              c0t50000C0F01E7FF2Cd0    FAULTED      0     0     0  external device fault
              c0t5000C500EC1DBE8Fd0    ONLINE       0     0     0  (resilvering)
            c0t50000C0F01E844E0d0      ONLINE       0     0     0
            replacing-6                DEGRADED     0     0     0
              c0t50000C0F01E92AF8d0    DEGRADED     0     0     0  too many errors
              c0t5000C500EC1DBF47d0    ONLINE       0     0     0  (resilvering)
            c0t50000C0F01E9BB64d0      ONLINE       0     0     0
            c0t5000C500A6DDE0C3d0      ONLINE       0     0     0
          raidz2-3                     ONLINE       0     0     0
            c0t50000C0F012E7834d0      ONLINE       0     0     0
            c0t5000C500A6D40607d0      ONLINE       0     0     0
            c0t50000C0F012F38C0d0      ONLINE       0     0     0
            c0t5000C500A6DDD3B7d0      ONLINE       0     0     0
            c0t50000C0F012F5248d0      ONLINE       0     0     0
            c0t50000C0F012F7B9Cd0      ONLINE       0     0     0
            c0t50000C0F012FBC88d0      ONLINE       0     0     0
            c0t50000C0F01DD5EA4d0      ONLINE       0     0     0
            c0t5000C500A6DE6DCFd0      ONLINE       0     0     0
          raidz2-4                     DEGRADED     0     0     0
            replacing-0                DEGRADED     0     0     1
              c0t50000C0F012E9EC0d0    DEGRADED     0     0     0  too many errors
              c0t5000C500EC1DBE0Fd0    ONLINE       0     0     0  (resilvering)
            c0t5000C500A6DDF153d0      ONLINE       0     0     0
            c0t5000C500A6DE205Fd0      ONLINE       0     0     0
            c0t5000C500F085DA17d0      ONLINE       0     0     0  (resilvering)
            c0t5000C500AEBB6CD3d0      ONLINE       0     0     0
            c0t50000C0F01E7ED58d0      ONLINE       0     0     0
            c0t50000C0F01E7F064d0      ONLINE       0     0     0
            c0t5000C50094AC98CFd0      ONLINE       0     0     0
            c0t5000C500862B7103d0      ONLINE       0     0     0  (resilvering)
          raidz2-5                     DEGRADED     0     0     0
            c0t5000C500AEBB7743d0      ONLINE       0     0     0
            c0t50000C0F012FBD9Cd0      ONLINE       0     0     0
            c0t50000C0F01E15DD8d0      ONLINE       0     0     0
            replacing-3                DEGRADED     0     0     0
              c0t5000C50085E0260Bd0    DEGRADED     0     0     0  too many errors  (resilvering)
              c0t5000C500EC66B4E3d0    ONLINE       0     0     0  (resilvering)
            c0t5000C500A6D4026Fd0      ONLINE       0     0     0
            c0t5000C500862B81E7d0      ONLINE       0     0     0  (resilvering)
            spare-6                    DEGRADED     0     0 1.13K
              replacing-0              DEGRADED     0     0     0
                c0t50000C0F01E831D0d0  REMOVED      0     0     0
                c0t5000C500F085FC3Bd0  ONLINE       0     0     0  (resilvering)
              c0t5000C50085E02AA3d0    ONLINE       0     0     0  (resilvering)
            c0t50000C0F01E90DB8d0      ONLINE       0     0     0
            c0t50000C0F01E251ACd0      ONLINE       0     0     0
          raidz2-8                     ONLINE       0     0     0
            c0t50000C0F024030CCd0      ONLINE       0     0     0
            c0t50000C0F024032B0d0      ONLINE       0     0     0
            c0t50000C0F024053F4d0      ONLINE       0     0     0
            c0t50000C0F029CF524d0      ONLINE       0     0     0
            c0t50000C0F02F31B58d0      ONLINE       0     0     0
            c0t5000C500EC1DCF87d0      ONLINE       0     0     0  (resilvering)
            c0t50000C0F02F33FBCd0      ONLINE       0     0     0
            c0t5000C50093A1687Fd0      ONLINE       0     0     0
            c0t50000C0F02F35488d0      ONLINE       0     0     0
          raidz2-9                     ONLINE       0     0     0
            c0t5000C500A7A5D61Fd0      ONLINE       0     0     0
            c0t50000C0F02404538d0      ONLINE       0     0     0
            c0t5000C500946545BBd0      ONLINE       0     0     0
            c0t50000C0F02405740d0      ONLINE       0     0     0
            c0t50000C0F029CDA60d0      ONLINE       0     0     0
            c0t50000C0F029CF6E8d0      ONLINE       0     0     0
            c0t50000C0F029D1408d0      ONLINE       0     0     0
            c0t50000C0F012C49A4d0      ONLINE       0     0     0
            c0t5000C500A6DDCA4Bd0      ONLINE       0     0     0
        cache
          c0t5000C5003022C8CFd0        ONLINE       0     0     0
          c0t5000C500BB32A2E7d0        ONLINE       0     0     0
        spares
          c0t5000C50085E02AA3d0        INUSE     currently in use

errors: No known data errors


root@nas1:/# zpool add nas1 spare c0t5000C500EC66B4E3d0
Assertion failed: nvlist_lookup_string(cnv, "path", &path) == 0, file zpool_vdev.c, line 659
Abort (core dumped)
 
Code:
root@nas1:/volumes# zpool status -v
  pool: nas1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Jul 30 13:06:13 2024
    462G scanned out of 252T at 93.0M/s, (scan is slow, no estimated time)
    94.3G resilvered, 0.18% done
This does indicate the resilvering still has a long way to go. Whether this is happening excessively slow or at a fairly normal rate (I somewhat lean towards the latter considering the size of the pool and the number of faulty drives) is hard to judge without knowing more about the system (drive and bus technology, drive capacity, used disk space, I/O load besides the resilvering process, etc.) You probably should wait it out before actually removing any of the drives.
 
Which disk is it exactly that keeps repeating resilvering?

# zpool add nas1 spare c0t5000C500EC66B4E3d0
That doesn't make sense.

c0t5000C500EC66B4E3d0 is part of raidz2-5 , being replaced (replacing-3, resilvering), now you want it add to a spare.

Which version of FreeBSD are you running?
 
If those are spinning disks, in a pool of that size and given the 'wrong' number of disks per vdev (raidz should always be a multiple of 2 plus the parity disks, otherwise you loose a lot of space to padding and performance also suffers) resilvering times can easily be multiple days or even weeks, especially if multiple disks have to be resilvered.
If all those disks are at the same age, you should prepare for even more disk failures during the resilvering process...
 
Which disk is it exactly that keeps repeating resilvering?


That doesn't make sense.

c0t5000C500EC66B4E3d0 is part of raidz2-5 , being replaced (replacing-3, resilvering), now you want it add to a spare.

Which version of FreeBSD are you running?
As I can't add the c0t5000C500EC66B4E3d0 drive as hot spares, then I just use the replace command to replace the degraded drive. I was wondering why when I replace the drive, some online disks will have resilvering status (Those drives were replaced some times ago)
 
Back
Top