ZFS pool failure on GPT partitioned disks.

Hi,


Here is the setup I have:

FreeBSD8.1 running as VM in ESXi server, and accessing 3 RAW mapped devices which are 500G SATA drives. These devices are partitioned in GPT ( labeled) and the partitions make up a RAID5 zpool. About a week ago I ran a scrub on the pool and got alot of errors, which were fixed by zfs. I started running scrub on cron every night and today I got few more errors, decided to reboot the VM. After the boot the zpool was in failed state, thought I could export and import it, but after the export could not import it any more. The drives are /dev/da0, da1, da2 ( da0p1, da1p1, da2p1 used to be the partitions ), there are no IO errors in dmesg or /var/log/messages.

When I checked /dev/ for the device files I can only see da0p1 and none of the respected files for the other 2 devices, running `gpart show` is showing only that one drive as being GPT partitioned.
My understanding is that the partition scheme on the drives got fubared for some reason, and now they don't show as GPT partitioned.

Can I somehow remake the drives as GPT but not touch the actual partition on which ZFS resides, so I can hopefully bring it online and recover my data.

Or any other ideas would be appreciated.
 
After alot of googling, I installed the gdisk app and managed to recover the GPT info on the disk, and now it is shown correctly with [cmd=]gpart show[/cmd] and [cmd=]gpart list[/cmd]

After the reboot I tried to import the pool but that didn't import anything and shows no errors or any log msgs. And there is not verbose option to see what's going on.

EDIT: Here is what zdb shows:

Code:
[root@razor /]# zdb -l /dev/gpt/disk1 
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
    version=14
    name='data'
    state=0
    txg=1495463
    pool_guid=11963922582894828022
    hostid=75354391
    hostname='razor'
    top_guid=2068181919889565984
    guid=13233110809129436461
    vdev_tree
        type='raidz'
        id=0
        guid=2068181919889565984
        nparity=1
        metaslab_array=23
        metaslab_shift=33
        ashift=9
        asize=1500305424384
        is_log=0
        children[0]
                type='disk'
                id=0
                guid=13233110809129436461
                path='/dev/gpt/disk1'
                whole_disk=0
                DTL=135
        children[1]
                type='disk'
                id=1
                guid=5294853771405793808
                path='/dev/gpt/disk2'
                whole_disk=0
                DTL=134
        children[2]
                type='disk'
                id=2
                guid=1544504323777352507
                path='/dev/gpt/disk3'
                whole_disk=0
                DTL=133
--------------------------------------------
LABEL 3
--------------------------------------------
    version=14
    name='data'
    state=0
    txg=1495463
    pool_guid=11963922582894828022
    hostid=75354391
    hostname='razor'
    top_guid=2068181919889565984
    guid=13233110809129436461
    vdev_tree
        type='raidz'
        id=0
        guid=2068181919889565984
        nparity=1
        metaslab_array=23
        metaslab_shift=33
        ashift=9
        asize=1500305424384
        is_log=0
        children[0]
                type='disk'
                id=0
                guid=13233110809129436461
                path='/dev/gpt/disk1'
                whole_disk=0
                DTL=135
        children[1]
                type='disk'
                id=1
                guid=5294853771405793808
                path='/dev/gpt/disk2'
                whole_disk=0
                DTL=134
        children[2]
                type='disk'
                id=2
                guid=1544504323777352507
                path='/dev/gpt/disk3'
                whole_disk=0
                DTL=133


Similar output for disk2 and disk3.
 
Well after long fight with no success on FreeBSD I was desperate to get my stuff and I decided to map the disks to a debian VM I have running on the ESXi. Mapped the drives, booted, saw the disks with the GPT partitions, installed zfs-fuse and did a successful import of the pool, zpool complained about other computer using the pool so I forced it. Currently doing rsync of my data to a standalone drive.

Don't know what to say, was really surprised ( but still happy :) )when debian managed to import the pool, yet FBSD refused without any reason. The RAW device maps are exactly the same for both VMs.


EDIT:

After the rsync was finished I decided to remap the RAW devices back to the FreeBSD and try to import the pool. Funny enough the pool got imported with no errors and is currently online. Issuing `zdb -l` gave the following result:

Code:
[root@razor ~]# zdb -l /dev/da0p1
--------------------------------------------
LABEL 0
--------------------------------------------
    version=14
    name='data'
    state=0
    txg=1497952
    pool_guid=11963922582894828022
    hostid=75354391
    hostname='razor'
    top_guid=2068181919889565984
    guid=5294853771405793808
    vdev_tree
        type='raidz'
        id=0
        guid=2068181919889565984
        nparity=1
        metaslab_array=23
        metaslab_shift=33
        ashift=9
        asize=1500305424384
        is_log=0
        children[0]
                type='disk'
                id=0
                guid=13233110809129436461
                path='/dev/gpt/disk1'
                whole_disk=0
                DTL=135
        children[1]
                type='disk'
                id=1
                guid=5294853771405793808
                path='/dev/da0p1'
                whole_disk=0
                DTL=134
        children[2]
                type='disk'
                id=2
                guid=1544504323777352507
                path='/dev/da2p1'
                whole_disk=0
                DTL=133

Same info goes for the other 3 LABELS and for the other 2 drives too, which leads me to think that the debian implementation of ZFS recovered the first 2 missing labels from the other 2, and FreeBSD doesn't do that...maybe it needs 50%+1 of the LABELS in order to recover the 1 missing...I don't know. Or this might just be a bug.
Anyway hopefully this adventure of my is useful to someone. Will recreate the pool from scratch, since this pool was migrated from previous versions of ZFS and from physical to VM ( thought I doubt the physical -> VM migration with RAW device mapping can be the reason for the fckup).
 
Back
Top