I have a 4 drive raidz1 setup. I wanted to replace my 3TB drives with 6TB drives. lets call them 3{abcd} and 6{abcd}. i shutdown the machine. physically removed the first drive, 3a, and replaced it with 6a. booted the machine and did a zpool replace data-pool 3a 6a. days later the resilver was done and all was good. Problems started when i was replacing 3b with 6b. while doing the resilver I lost power, when the machine came back up 3d started throwing up errors all over the place. i panicked a bit, read a bunch, maybe not the right things, and restarted the machine. After the restart the resilver did manage to finish. at one point zpool status showed both the replacement drive 6b and the failing drive 3d as resilvering. When it finished there were errors. a lot of errors. and the replacing status never cleared.
I had hoped I could cancel the replace and put 3b back in the raid, then replace the failing 3d, but here I am stuck. I've read that you can cancel a
Currently my zpool status gives the following:
What to do now?
Is it possible for me to cancel the replace and bring 3b back on line into the raid?
Should I be waiting longer for the zpool clear command to run? (resilvering 6a took around 8 days I think).
Should I offline or somehow remove 3d? I've tried a few reboots and sometimes having 3d connected hangs the boot. Other times not.
I'm runing Centos 6.6 (final).
I'm running version 0.6.3-1.2.el6 of zfs but I'm not sure what my pool is actually running. Not sure how to check or if it matters.
Thank you for taking the time to read all that. Any help would be appreciated. If this is not the right place for these kinds of questions can you please point me to the right place.
I had hoped I could cancel the replace and put 3b back in the raid, then replace the failing 3d, but here I am stuck. I've read that you can cancel a
replace by using a zpool detach command on the new drive, but I get:
Code:
# zpool detach data-pool ata-WDC_WD60EFRX-68MYMN1_WD-
WXL1H643R6JD
cannot detach ata-WDC_WD60EFRX-68MYMN1_WD-WXL1H643R6JD: pool I/O is currently suspended
Currently my zpool status gives the following:
Code:
# zpool status
pool: data-pool
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: resilvered 5.78G in 92h24m with 13919120 errors on Sat Jan 10 01:01:44 2015
config:
NAME STATE READ WRITE CKSUM
data-pool DEGRADED 0 0 1
raidz1-0 DEGRADED 0 0 6
ata-WDC_WD60EFRX-68MYMN1_WD-WXL1H644C1W2 ONLINE 0 0 0
replacing-1 DEGRADED 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0770602 ONLINE 0 0 0
ata-WDC_WD60EFRX-68MYMN1_WD-WXL1H643R6JD UNAVAIL 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0891608 ONLINE 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0891942 ONLINE 0 0 1
errors: 13919121 data errors, use '-v' for a list
zpool clear appears to hang as i've left it for over 24 hours and don't see any activity on my drive lights which were going like crazy on the replace. top didn't show any activity for zfs related things that I could tell. I'm sure there are better ways to inspect if it's actually doing something, but I don't know how. smartctl shows all the drives as PASSED.What to do now?
Is it possible for me to cancel the replace and bring 3b back on line into the raid?
Should I be waiting longer for the zpool clear command to run? (resilvering 6a took around 8 days I think).
Should I offline or somehow remove 3d? I've tried a few reboots and sometimes having 3d connected hangs the boot. Other times not.
I'm runing Centos 6.6 (final).
Code:
#uname -r
2.6.32-504.3.3.el6.x86_64
I'm running version 0.6.3-1.2.el6 of zfs but I'm not sure what my pool is actually running. Not sure how to check or if it matters.
Thank you for taking the time to read all that. Any help would be appreciated. If this is not the right place for these kinds of questions can you please point me to the right place.