zfs: what is the best way to replace a failed disk?

One of my fileservers has a five-disk hotplug enclosure, which contains the following zpool:
Code:
root@kg-f2# zpool status storage
  pool: storage
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 125h43m, 50.93% done, 121h8m to go
config:

	NAME        STATE     READ WRITE CKSUM
	storage     ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ad8     ONLINE       0     0     0
	    ad10    ONLINE       0     0     0
	    ad12    ONLINE       0     0    73  4.20G repaired
	    ad14    ONLINE       0     0     0
	    ada0    ONLINE       0     0     0

errors: No known data errors
The machine runs FreeBSD 8.1-stable:
Code:
root@kg-f2# uname -a
FreeBSD kg-f2.kg4.no 8.1-STABLE FreeBSD 8.1-STABLE #4: Fri Oct 29 12:11:48 CEST 2010
root@kg-f2.kg4.no:/usr/obj/usr/src/sys/GENERIC  amd64
Now I want to replace that failing disk (ad12).
Questions:
  1. Will I risk anything if I replace the disk when there is a scrub in progress?
  2. The handbook says that this is way to replace a disk:
    • offline the disk from the pool
    • physically replace the disk
    • replace the disk in the pool
    Is this still the best way to do it?
  3. Since the disks are in a hotplug enclosure can I just replace it, or do I have to run some command (atacontrol?) before removing it?
 
Let the scrub complete. Then check the output of $ zpool status to see if there's any files damaged. If there are no files listed as damaged, then clear the error counts via # zpool clear <poolname> and then do another scrub # zpool scrub <poolname>

If the second scrub comes back with 0 errors, then everything is fine, and ZFS did it's job, detecting drive errors and repairing files using the redundant data on the other drives. :) If you get more errors, then consider replacing the drive.

The process I use for replacing drives is:
Code:
# zpool offline <poolname> <diskname>
<remove drive>
<insert new drive>
# glabel label <name> <disk>
# zpool replace <poolname> <olddisk> <newdisk>
Using offline first flushes any pending writes and allows you to re-use the disk if something goes wrong with the replace operation. (This is with SCSI/SATA-based hot-plug systems, so no camcontrol/atacontrol required.)

Using glabel is optional, but I find it extremely useful in 24-drive systems. :)
 
phoenix said:
Let the scrub complete. Then check the output of $ zpool status to see if there's any files damaged. If there are no files listed as damaged, then clear the error counts via # zpool clear <poolname> and then do another scrub # zpool scrub <poolname>

If the second scrub comes back with 0 errors, then everything is fine, and ZFS did it's job, detecting drive errors and repairing files using the redundant data on the other drives. :) If you get more errors, then consider replacing the drive.
I've done that; I have run scrub every week on this machine, for almost two years now. This is how I know that it is time to replace that drive. :-)

The problem with this last scrub is that it has been going on for quite some time now:
Code:
scrub: scrub in progress for 137h9m, 65.54% done, 72h6m to go
Normally, scrubs have finished in 3 - 4 hours on this machine. I would like not having to wait 72 (or more) hours before doing the replace.

So, does anything bad happen if I replace the drive while there is a scrub in progress?
 
Update: I just stopped the scrub with # zpool scrub -s storage, then did the replace of the ad12 disk. One strange thing is that after physically replacing the disk (which sits in a hotplug enclosure) it did not show up in # atacontrol list. And # atacontrol attach ata6 or # atacontrol reinit ata6 didn't help either. So I just rebooted the server, and the ad12 showed up. Resilvering in progress:
Code:
root@kg-f2# zpool status storage
  pool: storage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h20m, 14.99% done, 1h57m to go
config:

	NAME            STATE     READ WRITE CKSUM
	storage         DEGRADED     0     0     0
	  raidz1        DEGRADED     0     0     0
	    ad8         ONLINE       0     0     0
	    ad10        ONLINE       0     0     0
	    replacing   DEGRADED     0     0     0
	      ad12/old  OFFLINE      0     0     0
	      ad12      ONLINE       0     0     0  117G resilvered
	    ad14        ONLINE       0     0     0
	    ada0        ONLINE       0     0     0

errors: No known data errors
That's all.
 
Hihi, I actually have something useful to add to this. First time, as far I know=)

I was also in your situation before, trying to figure out how to achive hot-swapping SATA-drives.

Turns out, in at least mine and your case too, I suspect you have to do:
Code:
# zpool offline <pool> <dev>
# atacontrol detach atax
# atacontrol attach atax
Where x is the number of the channel in question.
Code:
# zpool replace <pool> <dev>

Tip:
Code:
# atacontrol list
Shows what is connected to each channel.

Tada!

Mind you, the channel is first created at boot-time. Some motherboards- like mine- only creates channels when there is a drive connected. So if you try to plug in a drive to a port that wasn´t connected at boot, it won´t have a channel to detach attach to. But for my PCI-connected SATA-cards, it creates all channels regardless.

/Sebulon
 
Turns out, I wasn't done on this subject.

Last night, I was doing some performance testing, swapping around drives to see how it plays out, and I plugged in two drives into the onboard SATA 300 ports. When the system had started, those two drives were named ada0 and ada1.

Something slapped me in the back of my mind- ATA_CAM! I have previously tried to use the ATA_CAM driver to initialize ATA-drives through SCSI for true hot swapping, but never really got it working. I pulled out one of the drives connected and sure enough, the OS system noticed that and removed the device entry all by itself! When I pushed it back in, it reappeared!

Like Steve Jobs would have said, "It's amazing, wonderful, magical!"=)

After that, it was just a matter of letting zfs know about what had happened with either zpool online or zpool replace.

For anyone wanting the same capability, here´s how it's done:
http://forums.freebsd.org/showthread.php?p=123801

/Sebulon
 
Tingo, just for clarification, the ahci driver enables ATA_CAM automatically for (at least) SATA 300 devices, but if you also want this for ATA 133 and SATA 150, then you have to recompile kernel and let it know that you want to use ATA_CAM for all your drives, with:
Code:
options    ATA_CAM
in kernel config. At least, that´s how it was for me.
Tellsign is when your hard drives gets renamed from adX to adaX and see lots of SCSI probes and such for the drives in dmesg.

/Sebulon
 
tingo,

I had
Code:
ahci_load="YES"
in loader.conf all along, but it would not use ATA_CAM on my ATA 133 and SATA 150 drives unless I recompiled with
Code:
options ATA_CAM
in kernconf. That´s what I was trying to say all along. Also, just for further clarification, is that you dont´t need ahci_load in loader after you´ve recompiled either.

I can see you´re actually only using ATA_CAM on one of your drives from your previous
# zpool status

Code:
root@kg-f2# zpool status storage
  pool: storage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h20m, 14.99% done, 1h57m to go
config:

	NAME            STATE     READ WRITE CKSUM
	storage         DEGRADED     0     0     0
	  raidz1        DEGRADED     0     0     0
	    ad8         ONLINE       0     0     0
	    ad10        ONLINE       0     0     0
	    replacing   DEGRADED     0     0     0
	      ad12/old  OFFLINE      0     0     0
	      ad12      ONLINE       0     0     0  117G resilvered
	    ad14        ONLINE       0     0     0
	    [B]ada0[/B]        ONLINE       0     0     0

errors: No known data errors

While mine looks more like this:
Code:
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	pool1       ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada0    ONLINE       0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    ONLINE       0     0     0
	    ada3    ONLINE       0     0     0
and
# camcontrol devlist
gives:
Code:
<SAMSUNG HD103SJ 1AJ10001>         at scbus1 target 0 lun 0 (ada0,pass0)
<SAMSUNG HD103SJ 1AJ10001>         at scbus2 target 0 lun 0 (ada1,pass1)
<WDC WD10EARS-00Y5B1 80.00A80>     at scbus3 target 0 lun 0 (ada2,pass2)
<SAMSUNG HD103SJ 1AJ10001>         at scbus4 target 0 lun 0 (ada3,pass3)
<ST32000542AS CC34>                at scbus5 target 0 lun 0 (ada4,pass4)
<SAMSUNG HD103SJ 1AJ10001>         at scbus6 target 0 lun 0 (ada5,pass5)
<SAMSUNG HD103SJ 1AJ10001>         at scbus7 target 0 lun 0 (ada6,pass6)
<ELITE PRO CF CARD 8GB Ver2.22K>   at scbus12 target 1 lun 0 (ada7,pass7)
while
# atacontrol list
gives:
Code:
atacontrol: control device not found: No such file or directory
cause there aren´t any regular ata devices any more.

/Sebulon
 
Back
Top