ZFS Hard Drive Died, ZFS partitions need to be restored

KsanfFillum · Nov 13, 2020

Hello and thanks in advance!

I have local home FreeNAS server with ZFS pools on and a couple of 4Tb hard drives connected. Didn't use mirroring, shame on me.
One of the drives partially died. I mean, gpart gives me partitions, but I get IO error when try to mount it.
So now I have an image of the drive and partitioning from the old drive, this is it:

i mount an image and it 3 GB lower, then original drive, but it is OK. But image itself has no partitioning and testdisk can't see the image (used mdconfig command)
so i tried to set the same partitioning as the original drive with gpart create-add. Pool still don't want to mount.

I run restoration program on the image and found ZFS partition with same initial offset:

but with that program I can restore only files itself, without the folders tree (and there are tons of them).

When i try to mount the pool, it shows me that:

And the pool state is UNAVAL.

What can i do to try restore folder tree of that disk?
partitioning of md0:

Thanks in advance again and i swear to use mirroring futher!

SirDice · Nov 13, 2020

KsanfFillum said:
I have local home FreeNAS

FreeNAS is not supported here.

PC-BSD, FreeNAS, XigmaNAS, and all other FreeBSD Derivatives

KsanfFillum said:
Didn't use mirroring, shame on me.
One of the drives partially died. I mean, gpart gives me partitions, but I get IO error when try to mount it.

So, this was a striped set? In that case your whole pool is dead. Or was it a single drive pool? Then it's also dead. This is why you have backups, you do have backups don't you?

KsanfFillum · Nov 24, 2020

Thanks for your answer! I believe it is more zfs-related stuff rather then FreeNAS related one.
Yes, it was a single-drive set. I have no backup so I try my best to restore data from it. Some updates:

As I mentioned before, I have two copies of that drive: the first one is damaged drive itself and the second one is an image. The pool name was "Ouside_4Tb" with subpool "Outside_4Tb/4Tb" and some system ones.

When I use zpool import Ouside_4Tb it says me

Code:

[root@Cloud] /var/tmp# zpool import Outside_4Tb
cannot import 'Outside_4Tb': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.

I've tried zpool import -d /dev

Code:

[root@Cloud] /var/tmp# zpool import -d /dev
   pool: Outside_4Tb
     id: 6035276188144559901
  state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://illumos.org/msg/ZFS-8000-3C
config:

        Outside_4Tb             UNAVAIL  insufficient replicas
          11444248833043760621  UNAVAIL  cannot open

And I have the same disk as single-drive pool. When I ask its status, it says me this

Code:

[root@Cloud] /var/tmp# zpool status Outside_4Tb_2
  pool: Outside_4Tb_2
state: ONLINE
  scan: scrub repaired 0 in 12h13m with 0 errors on Sun Nov 22 12:13:58 2020
config:

        NAME                                          STATE     READ WRITE CKSUM
        Outside_4Tb_2                                 ONLINE       0     0     0
          gptid/e7a99a7a-fe42-11e8-ae38-50af73257810  ONLINE       0     0     0

errors: No known data errors

OK, I assume that zfs see the metadata of the pool, but for some reason couldn't bind it with the drive. Meta checkup - use zdb to determine if it could say something about the pool

Code:

[root@Cloud] /var/tmp# zdb -e -d Outside_4Tb
Dataset mos [META], ID 0, cr_txg 4, 20.8M, 227 objects
Dataset Outside_4Tb/4Tb [ZPL], ID 42, cr_txg 17, 3.19T, 256919 objects
Dataset Outside_4Tb/.system/configs-241bf4dd02d1413596a7e0d91081dbbf [ZPL], ID 213, cr_txg 20408930, 24.7M, 281 objects
Dataset Outside_4Tb/.system/syslog-241bf4dd02d1413596a7e0d91081dbbf [ZPL], ID 201, cr_txg 20408926, 5.59M, 72 objects
Dataset Outside_4Tb/.system/samba4 [ZPL], ID 195, cr_txg 20408924, 448K, 97 objects
Dataset Outside_4Tb/.system/cores [ZPL], ID 189, cr_txg 20408922, 1.36M, 18 objects
Dataset Outside_4Tb/.system/rrd-241bf4dd02d1413596a7e0d91081dbbf [ZPL], ID 207, cr_txg 20408928, 88.0K, 7 objects
Dataset Outside_4Tb/.system [ZPL], ID 183, cr_txg 20408920, 248K, 34 objects
Dataset Outside_4Tb/jails [ZPL], ID 219, cr_txg 20515640, 17.0M, 12 objects
Dataset Outside_4Tb/jails_2 [ZPL], ID 226, cr_txg 24408489, 88.0K, 7 objects
Dataset Outside_4Tb [ZPL], ID 21, cr_txg 1, 96.0K, 10 objects
Verified large_blocks feature refcount of 0 is correct
space map refcount mismatch: expected 134 != actual 117

Code:

[root@Cloud] /var/tmp# zdb -e -C -u Outside_4Tb

MOS Configuration:
        version: 5000
        name: 'Outside_4Tb'
        state: 0
        txg: 25098800
        pool_guid: 6035276188144559901
        hostid: 1484983193
        hostname: ''
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 6035276188144559901
            children[0]:
                type: 'disk'
                id: 0
                guid: 11444248833043760621
                path: '/dev/gptid/f8e3409a-0616-11e6-a9fc-50af73257810'
                whole_disk: 1
                metaslab_array: 35
                metaslab_shift: 35
                ashift: 12
                asize: 3998634737664
                is_log: 0
                DTL: 177
                create_txg: 4
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data

Uberblock:
        magic = 0000000000bab10c
        version = 5000
        txg = 25098823
        guid_sum = 17479525021188320522
        timestamp = 1599239915 UTC = Fri Sep  4 20:18:35 2020

space map refcount mismatch: expected 134 != actual 117

Ok, all the meta, from dataset to uberblock is alive! Nice to see it! What about zfs labels? my zfs partition is on /dev/da0p2.

Code:

[root@Cloud] /var/tmp# zdb -l /dev/da0p2
--------------------------------------------
LABEL 0
--------------------------------------------
failed to read label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to read label 1
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 5000
    name: 'Outside_4Tb'
    state: 0
    txg: 25098800
    pool_guid: 6035276188144559901
    hostid: 1484983193
    hostname: ''
    top_guid: 11444248833043760621
    guid: 11444248833043760621
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 11444248833043760621
        path: '/dev/gptid/f8e3409a-0616-11e6-a9fc-50af73257810'
        whole_disk: 1
        metaslab_array: 35
        metaslab_shift: 35
        ashift: 12
        asize: 3998634737664
        is_log: 0
        DTL: 177
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 5000
    name: 'Outside_4Tb'
    state: 0
    txg: 25098800
    pool_guid: 6035276188144559901
    hostid: 1484983193
    hostname: ''
    top_guid: 11444248833043760621
    guid: 11444248833043760621
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 11444248833043760621
        path: '/dev/gptid/f8e3409a-0616-11e6-a9fc-50af73257810'
        whole_disk: 1
        metaslab_array: 35
        metaslab_shift: 35
        ashift: 12
        asize: 3998634737664
        is_log: 0
        DTL: 177
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

So, the labels on the partition start are damaged, but the end ones are alive. Further search via SMART information gives me that: zfs partition starts from LBA 4194432 but the LBA 4194464 is damaged and for some reasons can't be read or remapped - when I try sg_reassign or write zeros via dd something goes wrong:

Code:

[root@Cloud] /var/tmp# sg_reassign --address=4194464 /dev/da0 -v
    reassign blocks cdb: 07 00 00 00 00 00
reassign blocks:
Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid command operation code
REASSIGN BLOCKS: Illegal request, invalid opcode sense key

Code:

[root@Cloud] /var/tmp# dd if=/dev/null of=/dev/da0 seek=4194464 bs=512 count=1 conv=noerror,sync
0+0 records in
0+0 records out
0 bytes transferred in 4.984661 secs (0 bytes/sec)

I even can't watch info from --grown.

All the actions were done on the physical drive. So, there are three guesses I made:
1. Pool can't be mounted because of hardware read error. If I remap that blocks or fill them with zeros, It becomes possible to mount the pool (guid of the drive and guid from zdb meta are the same).
2. If I copy end label blocks or find them on the image and make pretty accurate zfs partition on the image (with drive guid changes), It becomes possible to mount pool from the image.
3. Only first two labels part on the original drive was damaged, so I can restore all the folders tree pretty accurate.

Any advice for my thought? I'm not really familiar with FreeBSD, it was only home server project so, some of my thought could be mistaken. Thanks in advance for any help!

SirDice · Nov 24, 2020

Please stop posting pictures of textual information.

KsanfFillum · Nov 24, 2020

Sorry, edit message and change them as text blocks

SirDice · Nov 24, 2020

Thanks, pictures make it very difficult to read for some people. And it's impossible to quote something from it.

ZFS Hard Drive Died, ZFS partitions need to be restored

KsanfFillum

SirDice

Administrator

KsanfFillum

SirDice

Administrator

KsanfFillum

SirDice

Administrator