I just keep failing to replace a failed disk... :(

Hi

I'm trying to replace a failed disk, but regardless of what I do and in which order it always fails in the end with one of the following error messages.

Code:
/dev/ad12 is part of active pool 'ztuff'
cannot offline ad12: no valid replicas

The good part is that I don't have any data loss on my 4 disk raidz1 system. The system is degraded and it says that it is replacing one disk but nothing happens.

I tried to label the new disk with glabel, but zpool hijacked the disk overrun the labeling and thus I couldn't replace the drive, because it was already in use.

What is the proper way of doing this. As I said I've read tons of pages and tried countless different ways...

I'm getting desperate... I have an extra unused disk that I can momentarily use if it somehow would help...

Thanks!
 
Showing the output of # zpool status would be a good way to start. And identify which device(s) you are trying to replace.
 
I know exactly which drive I want to replace, but I just keep failing! I've read a lot of threads and manuals, but for some reason they all fail at some point.

If I'm supposed to take the drive offline, it fails because there is no valid replicas...
If I'm supposed to replace it, it fails is part of active pool 'ztuff'...

What I need is a proper how to do it manual! One that WILL work! One that starts with shut down the computer and replace the faulty disk... And then step by step after that... :p

My system is a fresh FreeBSD 8.0 RC3.

All I want is a home sever that actually do work properly. No fancy software, just samba and maybe a GUI. FreeBSD doesn't seem to be the answer. If I wont get this working I'll just have to switch to a Windows server, because I've never ever had this much trouble with all my windows machines combined over the last decade as I've had with this NAS in the last couple of months...
 
If you're not going to post what you've tried, the commands you've used, the order you've done them in, and the error messages you got, let alone the output of [cmd=]zpool status[/cmd] then there's really not much we can do to help.
 
ok, here goes!

Code:
NAS# zpool status
  pool: ztuff
 state: DEGRADED
 scrub: none requested
config:

	NAME			STATE	READ WRITE CKSUM
	ztuff			DEGRADED   0     0     0
	  raidsz		DEGRADED   0     0     0
	    label/WDC1		ONLINE	   0     0     0
	    label/WDC2		ONLINE	   0     0     0
	    replacing		DEGRADED   0     0     0
	      label/WDC3	ONLINE	   0     0     0
	      6905999403711...	UNAVAIL    0 3.18K     0 was /dev/ad12
	    label/WDC4		ONLINE	   0     0     0

errors: No known data errors

Code:
NAS# zpool offline ztuff /dev/label/WDC4
cannot offline label/WDC4: no valid replicas

Code:
NAS# zpool replace ztuff /dev/ad12 /dev/label/WDC4
invalid vdev specification
/dev/label/WDC4 is part of active pool 'ztuff'

I tried to follow for instance

http://docs.sun.com/app/docs/doc/817-2271/gazgd?l=fi&a=view
 
/dev/label/WDC4 is part of the vdev already, so you can't remove it or use it. IOW, zpool is working correctly by not allowing you to break the pool. :)

You need to work on 6905999403711... device, which is MISSING, and is being replaced by /dev/label/WDC3.

If you have a good set of backups and a spare disk, try the following:
  • # zpool export ztuff
  • shutdown the box
  • physically remove /dev/label/WDC3, set it aside for use later (mark it so you don't use it accidentally)
  • boot to single-user mode
  • # /etc/hostid start
  • # zpool import ztuff
At this point, the pool should import with a MISSING device and in a DEGRADED mode. However, hopefully, it will have replaced the serial number with the device name. It's also possible the import will error out. Don't worry. :) Just carry on.

  • shutdown the box
  • physically attach a new drive to the box (NOT the one you removed previously, /dev/label/WDC3)
  • boot to single-user mode
  • label the disk (whatever name)
  • # /etc/rc.d/hostid start
  • # zpool import ztuff
  • # zpool replace ztuff <serialnumber> label/whatever
At this point, it should start resilvering the pool using the new drive. If that succeeds, great. If not, just swap the old drive back in and you should be ok.
 
You can also try just a plain export and then import to see if zpool figures out the drive assignments, before trying the stuff in the post above.
 
Thanks for the instructions! I was a bit upset when I wrote my first reply, sorry for that.

I followed the instructions and they didn't really work.

1. zpool hijacks the labeled hard drive directly from the system, so label/WDC3 is nowhere to be found. instead it tries to replace it with ad12. The label works perfectly until I import the zpool and also afterwards if I export the zpool.

2. I can't replace the drive because the drive is already in use.

If I put the old drive back I have an unreliable drive in my pool. Right?

Any further ideas?
 
Back
Top