ZFS Device Failure, Glabels

Hi there,

I am running FreeBSD 8.1 which means I have ZFS Ver 3 Revision 14. I have a RAIDZ2 pool setup with 15 (2tb Samsung HD204UI) drives (I'm aware this doesn't match the ZFS Best Practices, I was more interested in static storage than IOPS), one of which has failed.

As I understand it from my reading here and in other documentation the correct procedure to replace a drive is:

1. Offline the Drive (using zpool offline poolname device)
2. Power off, replace the drive, power on
3. Replace the drive (using zpool replace poolname device) assuming they have the same name.

Initially I had some troubles offlining the disk, as it gave me a "no valid replicas" message despite the pool still being available when I physically unplugged the disk - but after I ran a scrub it resilvered ~500mb and I was able to offline it.

My issue is I have used GEOM labels for the disks so rather than my offlined disk appearing in the zpool as da0, it's appearing as label/disk1.

Do I need to label the replacement drive? As I understand it, the Glabel is written to the last 512 bytes of the disk so could this play havoc with the zpool if I don't label it?

Running the glabel list command still shows a glabel for disk da0 pointing to label/disk1 (which is my faulty disk) when the drive is unplugged. Also, with the drive unplugged, the results of camcontrol devlist show that there is a da0 plugged into the system. Does FreeBSD label its drives sequentially based on controller? If no other hardware in the system changes - I have pulled the disk previously located at /dev/da0 and replaced it with another disk - is it safe to assume that the replacement disk will also show up as /dev/da0?

If so - this will make labelling the disk fairly straightforward, so in order to resolve my problem I believe all I need to do will be:

Code:
#glabel label disk1 /dev/da0
#zpool replace poolname label/disk1

I am still reasonably green when it comes to FreeBSD so if there's something that you think I've missed or don't quite understand properly, please let me know. I'm anxious to get this resolved quickly so I can bring my fileserver back online.
 
@BAK

Spot on! Go for it. Happy resilvering:)

If you have hot-swap on the drives and the controller and driver supports it, you don´t even have to shut the system down to replace it.

If you´re unsure what disk to label since it shuffles the device-names around (as it´s supposed to), you can run:
# glabel list da0
And if it comes up nil, that´s the new drive, since it´s "clean" from the factory.

/Sebulon
 
  • Thanks
Reactions: BAK
Rather than create a new thread, I'm encountering the same issue on another disk that died during the resilvering process. However this time - I'm unable to offline the disk:

Code:
# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        tank              DEGRADED     0     0     0
          raidz2          DEGRADED     0     0     0
            da4           ONLINE       0     0     0
            label/disk2   ONLINE       0     0     0
            label/disk3   ONLINE       0     0     0
            label/disk4   ONLINE       0     0     0
            label/disk5   UNAVAIL      0   308     0  cannot open
            label/disk6   ONLINE       0     0     0
            label/disk7   ONLINE       0     0     0
            label/disk8   ONLINE       0     0     0
            label/disk9   ONLINE       0     0     0
            label/disk10  ONLINE       0     0     0
            label/disk11  ONLINE       0     0     0
            label/disk12  ONLINE       0     0     0
            label/disk13  ONLINE       0     0     0
            label/disk14  ONLINE       0     0     0
            label/disk15  ONLINE       0     0     0

errors: No known data errors

# zpool offline tank label/disk5
cannot offline label/disk5: no valid replicas

When I encountered this issue before, I was able to offline the disk after running a scrub - which resilvered about 500mb (on a 2tb disk) and finished in under 2 minutes. This time however, the full scrub completed without errors in 18 hours, and I'm still unable to offline the disk as per above.

Any thoughts on how I can replace this disk?
 
@BAK

Just pull it out and replace it. It doesn´t need to be "OFFLINE" to be replaced. When the new disk is inserted and labeled, you run:
# zpool replace tank label/disk5
Also, what´s your output of:
# uname -a
# zpool upgrade | head -1

/Sebulon
 
  • Thanks
Reactions: BAK
# uname -a]
Code:
FreeBSD neon 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010     
[email]root@mason.cse.buffalo.edu[/email]:/usr/obj/usr/src/sys/GENERIC  amd64
# zpool upgrade | head -1
Code:
This system is currently running ZFS pool version 14.

Both of these pieces of information are included in the first sentence my original post, although not in that exact format. :h

Thanks - I was under the impression that I was required to offline the disk before commencing the replace. I've labelled it and run:
# zpool replace tank label/disk5
And it's resilvering now. Thanks again for your prompt assistance!
 
Back
Top