ZFS Started Resilver on raidz1 then system removed second disk - HALP!

I'm new to ZFS, it's nice, but I'm afraid I'm in too deep.
Our system started slowing down... Found a couple of drives in the same raid that looked scary.
Replaced the drive that S.M.A.R.T. marked as failure eminent. (da59) Then da60 started to error.
Finally the system marked it "removed". Now I see new da59 showing errors. (Turns out it was
a warranty "refurb" unit. ARRGH!!!
I have fresh drives hopefully arriving today. Is there a way to start over? Stop or destroy the replacement
process, force the original da59 back online, then replace the OTHER drive (da60) that must have been
worse off then da59?

Code:
          raidz1-7                 DEGRADED 14.2K     0 12.2K
            da56                   ONLINE       0     0     0
            da57                   ONLINE       0     0     0
            da58                   ONLINE       0     0     0
            replacing-3            DEGRADED 14.2K     0     0
              1386012792062170018  OFFLINE      0     0     0  was /dev/da59/old
              da59                 ONLINE       0     0     0  (resilvering)
            13092123056171323448   REMOVED      0     0     0  was /dev/da60
            da61                   ONLINE       0     0     0
            da62                   ONLINE       0     0     0
            da63                   ONLINE       0     0     0

Thanks,
-Jon
 
I guess no one has seen this problem? Can I remove or detach the replacing-3 device so I can re-insert
the original da59 and online it?
 
The zpool's name is tank

Here's my NEW command plan, 15:05 CST

zpool online tank 13092123056171323448 da60

watch status to see da60 online

zpool offline tank replacing-3

once offline, pull "new" da59 and insert "old" da59
wait to see if the system recognizes "old" da59

then:

zpool online tank 1386012792062170018 da59

once da59 is online, if replacing-3 is still present
then:
zpool detach replacing-3

assuming the system is feeling better

then:
zpool offline tank da60

pull "old" da60 drive and insert "new" da60
wait for it to be recognized

then:
zpool replace tank 13092123056171323448 da60

Hopefully there is enough life in "old" da59 to allow for
new da60 to resilver....


Anyone know if this will work?
Suggestions?
 
post the full zpool status output, it will show the state of the pool, and a status. What does it say?
# zpool status -v tank
pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Jan 12 10:47:46 2024
594G scanned out of 673T at 12.1M/s, (scan is slow, no estimated time)
45.9M resilvered, 0.09% done
config:

NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 6.12K
raidz1-0 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
da8 ONLINE 0 0 0
da9 ONLINE 0 0 0
da10 ONLINE 0 0 0
da11 ONLINE 0 0 0
da12 ONLINE 0 0 0
da13 ONLINE 0 0 0
da14 ONLINE 0 0 0
da15 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
da16 ONLINE 0 0 0
da17 ONLINE 0 0 0
da18 ONLINE 0 0 0
da19 ONLINE 0 0 0
da20 ONLINE 0 0 0
da21 ONLINE 0 0 0
da22 ONLINE 0 0 0
da23 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
da32 ONLINE 0 0 0
da33 ONLINE 0 0 0
da34 ONLINE 0 0 0
da35 ONLINE 0 0 0
da36 ONLINE 0 0 0
da37 ONLINE 0 0 0
da38 ONLINE 0 0 0
da39 ONLINE 0 0 0
raidz1-4 ONLINE 0 0 0
da24 ONLINE 0 0 0
da25 ONLINE 0 0 0
da26 ONLINE 0 0 0
da27 ONLINE 0 0 0
da28 ONLINE 0 0 0
da29 ONLINE 0 0 0
da30 ONLINE 0 0 0
da31 ONLINE 0 0 0
raidz1-5 ONLINE 0 0 0
da40 ONLINE 0 0 0
da41 ONLINE 0 0 0
da42 ONLINE 0 0 0
da43 ONLINE 0 0 0
da44 ONLINE 0 0 0
da45 ONLINE 0 0 0
da46 ONLINE 0 0 0
da47 ONLINE 0 0 0
raidz1-6 ONLINE 0 0 0
da48 ONLINE 0 0 0
da49 ONLINE 0 0 0
da50 ONLINE 0 0 0
da51 ONLINE 0 0 0
da52 ONLINE 0 0 0
da53 ONLINE 0 0 0
da54 ONLINE 0 0 0
da55 ONLINE 0 0 0
raidz1-7 DEGRADED 14.4K 0 12.2K
da56 ONLINE 0 0 0
da57 ONLINE 0 0 0
da58 ONLINE 0 0 0
replacing-3 DEGRADED 14.4K 0 0
1386012792062170018 OFFLINE 0 0 0 was /dev/da59/old
da59 ONLINE 0 0 0 (resilvering)
13092123056171323448 REMOVED 0 0 0 was /dev/da60
da61 ONLINE 0 0 0
da62 ONLINE 0 0 0
da63 ONLINE 0 0 0
raidz1-8 ONLINE 0 0 0
da64 ONLINE 0 0 0
da65 ONLINE 0 0 0
da66 ONLINE 0 0 0
da67 ONLINE 0 0 0
da68 ONLINE 0 0 0
da69 ONLINE 0 0 0
da70 ONLINE 0 0 0
da71 ONLINE 0 0 0
cache
ada1 ONLINE 0 0 0

errors: No known data errors
#
It seems to restart about once a day, but since da60 has failed and the system has "removed" it, the resilvering never gets anywhere...
I need to get the old da59 back online. It was just marked as needing replaced. I guess da60 was worse. If I can stop the resilver and get old da59 back online, I can resilver da60, and THEN do da59..
Thanks for looking...
 
Back
Top