Solved Cannot "zpool clear" degraded array with faulted (but still online) drive

FreeBSD 12.1-R

I have a drive that occasionally misbehaves and degrades the array, but "zpool clear" doesn't clear that degradation. Neither does offlining and onlining the drive.

The only way to get around this is to detach (remove) the drive from the array, then re-add it as a new array member.

Am I misunderstanding the purpose of clear, or is this a bug?

Here's the sequence of commands that don't change anything (except the error count for the drive)

Code:
# zpool status zroot
  pool: zroot
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: resilvered 516K in 0 days 00:00:01 with 0 errors on Sat Dec  5 16:22:49 2020
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            da0p3   ONLINE       0     0     0
            ada0p3  ONLINE       0     0     0
            ada1p3  FAULTED      0   467     0  too many errors

errors: No known data errors

# zpool clear zroot
# zpool status zroot
  pool: zroot
state: DEGRADED
  [...]
            ada1p3  FAULTED      0     0     0  too many errors
# zpool offline zroot ada1p3
# zpool online zroot ada1p3
zpool warning: device 'ada1p3' onlined, but remains in faulted state
use 'zpool clear' to restore a faulted device
# zpool clear zroot
# zpool status zroot
  pool: zroot
state: DEGRADED
  [...]
            ada1p3  FAULTED      0     0     0  too many errors
# zpool clear zroot ada1p3
# zpool status zroot
  pool: zroot
state: DEGRADED
  [...]
            ada1p3  FAULTED      0     0     0  too many errors
 
Following this up, I think the reason zpool clear apparently does nothing is simply because the write to the drive is failing. Time to retire it.

Code:
Dec 12 11:53:30 xxx kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 00 00 00 40 00 00 00 01 00 00
Dec 12 11:53:30 xxx kernel: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
Dec 12 11:53:30 xxx kernel: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain
 
Back
Top