howto zpool replace - find physical device

I have to replace a device of my raid system, but I am not sure how to do it the right way.

My server occasionally hung up, and I found failures of one device ad10
Code:
Aug  2 03:14:09 srvfbsd01 kernel: ad10: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=1212353536

As I unterstand I just have to replace the device physically and then by the replace command in the pool, like:

Code:
zpool replace tank c1t1d0

Is this right that way?
How can I find out which physical device is ad10? I have a HP ProLiant ML150 server.
How can I see the size of the disk (capacity)?
My pool looks like this:
Code:
        NAME            STATE     READ WRITE CKSUM
        tank            DEGRADED     0     0     0
          raidz2        DEGRADED     0     0     0
            ad4         ONLINE       0     0     0
            ad6         ONLINE       0     0     0
            ad8         ONLINE       0     0     0
            ad10        ONLINE       0     0     0
            ad12        ONLINE       0     0     0
            replacing   DEGRADED     0     0     0
              ad14/old  UNAVAIL      0  574K     0  cannot open
              ad14      ONLINE       0     0     0

(ad14 is another problem)

I also tried to offline the disk, in order to prevent the server from additional hangups, but it seems there is no way?
Code:
zpool offline tank ad10
cannot offline ad10: no valid replicas

Thanks for any help!
Daniel
 
Hi,

sketchy stuff you have there. Seems as it's gotten the impression that you are short on parity, even though you should be good to go. Straight replace assumes that the old drive has the same name as the new one. E.g. if you replace ad10 with ad10, but if the name has changed, you have to be more precise:
Code:
# zpool replace tank ad10 ad24
Or even force it:
Code:
# zpool replace -f tank ad10 ad24

If you are unsure about what drives you have, and what they are called, run:
Code:
# atacontrol list



http://opensolaris.org/jive/thread.jspa?messageID=482505
This link support the fact that it possible to replace more than one drive at a time, even encourages it, but your case speak to the contrary, and I have personally never had to try it.

Worst case is to just let the current resilvering finish before tackling the next.

Hope you solve it!

/Sebulon
 
Do you know if the replace (resilver) of ad14 finished successfully?
I don't know about anyone else, but with older releases (zfs in 8.1 or 8.0, can't remember which), whenever I tested a drive replace, the 'old' device would never disappear, even after successful resilver and I'd be left in a state similar to you. I always ended up doing a
Code:
zpool detach tank ad14/old
to get rid of the missing disk. *I take no responsibility for loss of data but (theoretically) zfs shouldn't let you do something that will fail the pool.* I actually have notes I wrote to remind myself what to do as it seemed awkward but it seems to have changed to work as I would expect in 8.2.

You should technically be able to offline a device in the state as you have raidz2 but again, until recently zfs was known to be too overly safe with offline/detach and wouldn't let you remove disks from a mirror or offline disks even if you had sufficient redundancy. If you can sort ad14 you should be able to offline the disk.

Code:
atacontrol cap ad10
should give you the serial number/model to identify the drive
 
Thanks so far, but I really got into more troubles now.

I first tried to detach ad14/old
Code:
zpool detach tank ad14/old
which wasn't possible (no valid replica)

I then wanted to replace the disk ad10, but eventually removed (physically) to wrong one, and altough I put it in place again, I am not able to boot anymore, as zfs can not be mounted.

How can I repair this situation. I am really kind of desperate, and very thankful for any help!
 
Finally I could replace the disk, but I ended up in same situation as with ad14:

Code:
  pool: tank
 state: DEGRADED
 scrub: none requested
config:

        NAME            STATE     READ WRITE CKSUM
        tank            DEGRADED     0     0     0
          raidz2        DEGRADED     0     0     0
            ad4         ONLINE       0     0     0
            ad6         ONLINE       0     0     0
            ad8         ONLINE       0     0     0
            replacing   DEGRADED     0     0    62
              ad10/old  UNAVAIL      0 12.0K     0  cannot open
              ad10      ONLINE       0     0     0
            ad12        ONLINE       0     0     0
            replacing   DEGRADED     0     0     0
              ad14/old  UNAVAIL      0 11.6K     0  cannot open
              ad14      ONLINE       0     0     0

errors: No known data errors

The problem was discussed already in
http://forums.freebsd.org/showthread.php?t=18519

The new disk ad10 wasn't in use before and I get errors for ad14, but not for ad10
Code:
GEOM: ad14: the primary GPT table is corrupt or invalid.
GEOM: ad14: using the secondary instead -- recovery strongly advised.
ad10: 953869MB <Seagate ST31000524AS JC4B> at ata5-master SATA150

I am not able to detach the old device:
Code:
zpool detach tank ad10/old
cannot detach ad10/old: no valid replicas
 
I didn't try out your last suggestion. I am not quite sure what a export of the pool means and afraid to even harm my system more.

I think I should do so when it comes to one of the errors:
Code:
GEOM: ad14: the primary GPT table is corrupt or invalid.
GEOM: ad14: using the secondary instead -- recovery strongly advised.

But the other one doesn't seem to be related to the same issue (device that wasn't zeroed?)
Code:
NAME            STATE     READ WRITE CKSUM
ad10/old  UNAVAIL      0 12.0K     0  cannot open
ad14/old  UNAVAIL      0 11.6K     0  cannot open
It seems there are some write operation pending, or what does it mean.
Isn't there some simple way to tell the pool to forget about these?

Thanks so far!
Daniel
 
I have just resolved a similar issue as you Daniel using steps from this thread and the original thread you created in October.
Thank you Phoenix and usdmatt.

Steps I followed:
* zpool export <poolname>
* shutdown the ZFS box
* boot the ZFS box to single-user mode
* /etc/rc.d/hostid start
* zpool import <poolname>
* zpool scrub <poolname>
* zpool detach <unavailable disk>

Before executing all these steps, I was stuck with a replaced disk that was listed as 'replacing' and I couldn't remove.
 
Back
Top