ZFS Resilver on different machine

embeddedbob · Dec 10, 2013

I have a system with 3x 1.5 TB that has been working OK for two to three years. Lately the system would intermittently turn on but not display anything, beep or show any signs of life. I have noticed this a couple of times in the past but then it would vanish and so I assumed the motherboard was not right. As it mostly stays on, it wasn't much of an issue. It's a shame as I found it difficult to find a motherboard and CPU combination that was ECC compatible and low power (AMD 64).

Recently one of the Samsung 1.5 TB disks has died (click of death) but now the computer refuses to boot at all, and the disk is dead (great eh).

To recover the data without stressing the other disks, I plan to image the other two disks in the RAID-Z on another machine using dd. Does anyone know if I can lower the transfer rate or transfer in block sizes that reduce the likelihood of another failure while backing them up?

Then once that's done, I was going to put them in another machine to resilver. Is there meta data on the dead machine or can I just import the other two disks as part of a three-disk RAID-Z and resilver onto new disk?

The machine has a SSD as the main system disk so I could move that into the new system for recovery, although ~~ill~~ I'll have to fettle with it as the kernel is trimmed down for the broken machine (I won't be doing that again).

The system is backed up but I will still lose some data I would rather not if this fails.

Thanks for any help.

usdmatt · Dec 10, 2013

Assuming everything works as designed and the pool has no further errors, you should have no problem importing the two-disk degraded pool into another system. As the pool was never exported from the old system, you'll need to use the -f option: # zpool import -f poolname.

As an interesting side note, if you encounter a problem with the disks, it is possible to use backup image files directly: (of course, you'd probably want to have a second read-only copy of the images somewhere if they were all you had left)

Code:

# mdconfig -a -t vnode -f /data/disk1_backup.img
# mdconfig -a -t vnode -f /data/disk2_backup.img
# zpool import -f pool

embeddedbob · Dec 10, 2013

Thanks, ~~ill~~ I'll update at the weekend with my attempt to recover.

embeddedbob · Dec 21, 2013

I created UFS filesystems on the spare disks, used mount to access them one at a time to make images of the two RAIDZ1 disks.

Code:

$ gpart create -s gpt ad8
$ gpart add -t freebsd-ufs ad8
$ newfs /dev/ad8p1
$ mount /dev/ad8p1 /mnt
$ dd if=/dev/ad6 of=/mnt/top1.img bs=8m

I then attached the 2 disks from the RAIDZ1 and a new spare disk.

Code:

$zpool import -f data
$zfs status
raidz1-0 DEGRADED 0 0 0
ad6 ONLINE 0 0 0
ad10 ONLINE 0 0 0
56612413128944087700 UNAVAIL 0 0 0 was /dev/ada3

The new device nodes were picked up automatically. I needed to provide the id of the failed disk as the device node had changed:

Code:

zpool replace data 56612413128944087700 ad8

Requesting the status:

Code:

$zpool status -v
raidz1-0 DEGRADED 0 0 0
ad6 ONLINE 0 0 0
ad10 ONLINE 0 0 0
replacing-2 UNAVAIL 0 0 0
   56612413128944087700 UNAVAIL 0 0 0 was /dev/ada3
   ad8 ONLINE 0 0 0 (resilvering)

ZFS completed the resilver with the following message: "scan: resilvered 565G in 11h5m with 0 errors"

All directories within the pool were missing so I exported, and then imported the pool:

Code:

$zpool export data
$zpool import data

All that was left was to format a spare backup drive and make a copy of the data.

Code:

$ dd if=/dev/zero of=/dev/da0 bs=8m count=1
$ gpart create -s gpt da0
$ gpart add -t freebsd-ufs da0
$ newfs /dev/da0p1
$ mkdir /backup
$ mount /dev/da0p1 /backup
$ df -g | grep /backup

:beer

ZFS Resilver on different machine

embeddedbob

usdmatt

embeddedbob

embeddedbob