how to recover degraded zpool

Greetings gang,

My HDD setup:
  • 1x SSD (FreeBSD drive)
  • 1x WD Raptor (Windows/Linux dual-boot)
  • 3x 1GB drives in a zpool configuration

I recently installed a Linux distro in the Raptor. After the install, much to my surprise, I noticed that Linux had written data on my first zpool drive during the install destroying the data on that device.
The HDD works fine so I just want the other two drives to rebuild the degraded drive.
I spent some time on Oracle ZFS web page manuals but none have my exact problem.
I'm thinking 'zpool replace' but not to sure if or how to implement it.
There is too much to lose and I'm scared of going solo on this one.
Years of collecting music and pron could go down the drain. ;)

Code:
port79@bsd# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
rex   2.72T   907G  1.83T    32%  1.00x  DEGRADED  -
port79@bsd# zpool status -v rex
  pool: rex
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	rex                      DEGRADED     0     0     4
	  raidz1-0               DEGRADED     0     0    16
	    7023999755423468813  UNAVAIL      0     0     0  was /dev/ada0
	    ada1                 ONLINE       0     0     0
	    ada2                 ONLINE       0     0     0
 
Much to loose?
3 1GB drives in a zpool configuration
Lucky you posted a zpool list or I would have had a stroke:) There´s a tad bit difference betwen GB´s and TB´s in this context.

I´m guessing the drive is back to beeing called ada0, but begin by listing the drives to be sure:
# camcontrol devlist
Then resilver the drive by issuing:
# zpool replace rex 7023999755423468813 ada0
After it completes you should also:
# zpool scrub rex
To be sure you´re in the clear.

The ZFS Administration Guide
Keep it under your pillow:)

Redundancy is NOT a backup. There´s this old jungle saying "The Phantom cannot die, but drives do" and if you started out with three brand new drives, chances are they also decide to die at the same time. Raidz won´t save you then. For something as important as pr0n, you must have a proper backup. Consider buying a 2TB drive, create a secondary pool and do regular zfs send/recv from the primary pool to your secondary.

/Sebulon
 
You should be doing a backup to a separate pool if you don't want to lose your data... ideally a separate server, in case of hardware failures, such as the IO controller melting and deciding to write random garbage to your disks.

Next, you should be using labels, not device names. Make sure you back up before doing this! Try this command:
# glabel

For example if you were creating a new pool or erasing your old one, you would do:
# glabel label rex1 ada0
# glabel label rex2 ada1
# glabel label rex3 ada2
# zpool create rex raidz1 label/rex1 label/rex2 label/rex3

I am not sure how it affects a running system, but in theory, it is a bad idea since the label writes to the end of the disk, where zfs might have stored some data. So if it was me, I would back it all up, and do one disk at a time. Something along the lines of:

edit: the mystery about offlining/replacing disks was solved, and since this problem seems common, I created this HOWTO: http://forums.freebsd.org/showthread.php?p=157004 which you should use instead of refering to this post.

# zpool scrub rex
# zpool offline ada0
# glabel label rex1 ada0
# zpool replace ada0 label/tank1
Wait for resilver; repeat for all disks excluding the scrub, then maybe scrub again at the end. And of course you should back it up first, and make sure not to offline anything until you have an optimal [non-degraded] pool. And I think the above procedure has errors... maybe you can't "replace" without removing or rebooting it or somehow tricking it into changing the OFFLINE disk to UNAVAIL. Practice it in a virtual machine first. [edit: solution to the removal problem BTW is to replace with a different disk, label and replace again, or label a different permanent replacement disk and replace once]


And with raidz1, you should scrub often. If you lose one disk, and then you replace it, and another disk discovers an error while resilvering, you lose some data.

And I don't really think Linux would arbitrarily write to one of the disks. I think then it would say FAULTED instead of UNAVAIL. (However, from personal experience, the Windows XP Professional SP2 boot CD will for no apparent reason destroy your MBR on non-Windows disks if you tell it to repair your system, and if it hangs during the 'repair', it will also destroy the Windows disk's partition table). The root cause of your problem was likely that you didn't use labels.

And don't attempt any of the above until you test it in a virtual machine, including backing up and restoring (using zfs send ... rex | zfs recv ... rex2) if you are not already knowledgeable about that.
 
update

Hi Guys,

Sebulon: Thanks for the quick reply and the ZFS Guide you provided is a must have for all ZFS users. Good looking out with:
# camcontrol devlist
:beer cheers to the "TB"s & the "GB"s :beergrin
peetaur: I never knew about glabel. I will use it on my next zpool. It might have been Win7 that did me wrong because I re-installed Win7 and installed Ubu on the raptor the same day.

Update: I forgot that I had unplugged ada0 when I found data had been written to it. My original plan was to wipe-out the drive and plug it back in as a new drive. Last night, I plugged in the drive and I tried to import the pool on FreeBSD 9.0 RC2 but 'zpool list' showed nothing. Then I 'zpool import rex' and that worked.
# zpool status -v rex
showed the drive as degraded and it was resilvering it automatically!
Then I:
# zpool scrub rex
as both of you suggested. Took about 2 hours. Reported only one unrecoverable file (meaningless pr0n vid thank heavens).
I never done a zpool scrub. Maybe that's why that file was lost. I always thought scrub was done to fix errors, not as maintenance.
My problem now is that when I reboot I can't see any files in the zpool. 'ls' blank.
I can only see the files after export then import. I chmod 0755 rex and still can't 'ls' ofter reboot. What gives?
 
@fossman

Wierd that zpool list didn´t show anything. It´s as if it wasn´t imported into the system.
# zpool import
Shows any pools that are available for import.

Blankness after reboot seems to me that it´s not mounted. You need to have the following:

/boot/loader.conf
Code:
zfs_load="YES"
/etc/rc.conf
Code:
zfs_enable="YES"
For it to automatically mount filesystems that have a mountpoint specified in zfs:
# zfs get -r mountpoint rex
and if mountpoint is none or legacy, then zfs won´t automatically mount any of it.

# df -h
Shows what you have mounted in (-h) human-readable numbers.

/Sebulon
 
I never done a zpool scrub. Maybe that's why that file was lost. I always thought scrub was done to fix errors, not as maintenance.

Yep, scrub is supposed to fix errors, but if you lost your redundancy, all it can do is detect based on metadata, and tell you what you lost. So you should scrub while you still have redundancy.
 
I was so amazed to watch FreeBSD 9.0 RC configuring my WiFi automatically that I forgot to add ZFS to my /boot/loader.conf and /etc/rc.conf.

BTW the sound driver is also automatic with FBSD FreeBSD 9.0. Kudos to the dev-team. Now only if nvidia and flash were automatically installed who would need PC-BSD? :e

Thanks a lot you guys. You'll be seeing more of me ;)
 
Top