ZFS Zpool I/O Error and no drive attached to the pool

Gpop

New Member

Reaction score: 1
Messages: 15

Hi Everyone,

I had my PSU die on me and now I'm facing issue with the zpool which I can't figure out.

Code:
[root@freenas ~]# zpool import
   pool: gDisk
     id: 4321208912538017444
  state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-72
config:

        gDisk                                           FAULTED  corrupted data
          raidz1-0                                      ONLINE
            gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83  ONLINE
            gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83  ONLINE
            gptid/3a5b5d0b-943e-11e4-8c0e-a01d48c76648  ONLINE
            gptid/bd2204b8-3024-11e4-beb2-6805ca1cb42a  ONLINE
Doing a zpool import -F gDisk
Code:
cannot import 'gDisk': I/O error
        Destroy and re-create the pool from
        a backup source.
However, once I do this I get the following message with the same 4 numbers repeating over and over.

Code:
Nov 17 09:43:59 freenas ZFS: vdev state changed, pool_guid=4321208912538017444 vdev_guid=13109049489029127203
Nov 17 09:43:59 freenas ZFS: vdev state changed, pool_guid=4321208912538017444 vdev_guid=4774203770015519164
Nov 17 09:43:59 freenas ZFS: vdev state changed, pool_guid=4321208912538017444 vdev_guid=9019238602065831635
Nov 17 09:43:59 freenas ZFS: vdev state changed, pool_guid=4321208912538017444 vdev_guid=11673891713223961018
Nov 17 09:43:59 freenas ZFS: vdev state changed, pool_guid=4321208912538017444 vdev_guid=13109049489029127203
Nov 17 09:43:59 freenas ZFS: vdev state changed, pool_guid=4321208912538017444 vdev_guid=4774203770015519164
Nov 17 09:43:59 freenas ZFS: vdev state changed, pool_guid=4321208912538017444 vdev_guid=9019238602065831635
The output of the zdb -l /dev/ada0p2 gives me the following (the same for all drives)

Code:
Code:
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'gDisk'
    state: 0
    txg: 34596237
    pool_guid: 4321208912538017444
    hostid: 2970101908
    hostname: 'freenas.local'
    top_guid: 16522616241267246162
    guid: 11673891713223961018
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 16522616241267246162
        nparity: 1
        metaslab_array: 31
        metaslab_shift: 36
        ashift: 12
        asize: 11993762234368
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 11673891713223961018
            path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 487
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13109049489029127203
            path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 486
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 4774203770015519164
            path: '/dev/gptid/3a5b5d0b-943e-11e4-8c0e-a01d48c76648'
            whole_disk: 1
            DTL: 485
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 9019238602065831635
            path: '/dev/gptid/bd2204b8-3024-11e4-beb2-6805ca1cb42a'
            whole_disk: 1
            DTL: 484
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
------------------------------------
LABEL 1
------------------------------------
    version: 5000
    name: 'gDisk'
    state: 0
    txg: 34596237
    pool_guid: 4321208912538017444
    hostid: 2970101908
    hostname: 'freenas.local'
    top_guid: 16522616241267246162
    guid: 11673891713223961018
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 16522616241267246162
        nparity: 1
        metaslab_array: 31
        metaslab_shift: 36
        ashift: 12
        asize: 11993762234368
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 11673891713223961018
            path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 487
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13109049489029127203
            path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 486
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 4774203770015519164
            path: '/dev/gptid/3a5b5d0b-943e-11e4-8c0e-a01d48c76648'
            whole_disk: 1
            DTL: 485
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 9019238602065831635
            path: '/dev/gptid/bd2204b8-3024-11e4-beb2-6805ca1cb42a'
            whole_disk: 1
            DTL: 484
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
------------------------------------
LABEL 2
------------------------------------
    version: 5000
    name: 'gDisk'
    state: 0
    txg: 34596237
    pool_guid: 4321208912538017444
    hostid: 2970101908
    hostname: 'freenas.local'
    top_guid: 16522616241267246162
    guid: 11673891713223961018
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 16522616241267246162
        nparity: 1
        metaslab_array: 31
        metaslab_shift: 36
        ashift: 12
        asize: 11993762234368
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 11673891713223961018
            path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 487
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13109049489029127203
            path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 486
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 4774203770015519164
            path: '/dev/gptid/3a5b5d0b-943e-11e4-8c0e-a01d48c76648'
            whole_disk: 1
            DTL: 485
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 9019238602065831635
            path: '/dev/gptid/bd2204b8-3024-11e4-beb2-6805ca1cb42a'
            whole_disk: 1
            DTL: 484
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
------------------------------------
LABEL 3
------------------------------------
    version: 5000
    name: 'gDisk'
    state: 0
    txg: 34596237
    pool_guid: 4321208912538017444
    hostid: 2970101908
    hostname: 'freenas.local'
    top_guid: 16522616241267246162
    guid: 11673891713223961018
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 16522616241267246162
        nparity: 1
        metaslab_array: 31
        metaslab_shift: 36
        ashift: 12
        asize: 11993762234368
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 11673891713223961018
            path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/db835a9b-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 487
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13109049489029127203
            path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            phys_path: '/dev/gptid/dc882ef3-665a-11e2-b37c-a0b3cce25a83'
            whole_disk: 1
            DTL: 486
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 4774203770015519164
            path: '/dev/gptid/3a5b5d0b-943e-11e4-8c0e-a01d48c76648'
            whole_disk: 1
            DTL: 485
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 9019238602065831635
            path: '/dev/gptid/bd2204b8-3024-11e4-beb2-6805ca1cb42a'
            whole_disk: 1
            DTL: 484
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
Smartctl also doesn't send back any issue with any disk.

I tried all the obvious import command (-f -F -X Readonly ...) and they all lead to the same result. The GUI says that the drive are unused so I'm not sure what is going on here. Is there a way to relink the drives to the pool and access the data?

Any help would be more than welcome and I'm willing to try anything at this stage.
 
Last edited by a moderator:

ralphbsz

Daemon

Reaction score: 866
Messages: 1,395

What messages about the disks are there in dmesg or /var/log/messages? Can you search /var/log/messages back to when the PSU actually failed, if there is nothing obvious recently? There must be a cause for the drives to have I/O errors or go offline, and the error messages will probably tell us.
 

ShelLuser

Son of Beastie

Reaction score: 1,669
Messages: 3,512

First of all: FreeNAS isn't FreeBSD, you're better off asking about this on the FreeNAS forums. Even so, this is the major risk (and downside) of ZFS: when the pool gets corrupted you'll lose all your filesystems vs. only one when using UFS.

However, one thing which you can do is trying to access the stuff read only. In my experience most problems surface whenever the system tries to write stuff to the pool which seems to be pretty constantly. The bad news is that readonly access also prevents file system maintenance (unlike with UFS) but on the upside you may be able to access (and secure) all your data.

# zpool import -o readonly=on -fnR /mnt gDisk. Followed by zfs list to check if it actually found any filesystems which you're going to have to mount manually.

Next time I wouldn't immediately try -F, instead use -Fn; that way you don't risk corrupting the pool even further (due to that constant writing).
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

First of all: FreeNAS isn't FreeBSD, you're better off asking about this on the FreeNAS forums. Even so, this is the major risk (and downside) of ZFS: when the pool gets corrupted you'll lose all your filesystems vs. only one when using UFS.

However, one thing which you can do is trying to access the stuff read only. In my experience most problems surface whenever the system tries to write stuff to the pool which seems to be pretty constantly. The bad news is that readonly access also prevents file system maintenance (unlike with UFS) but on the upside you may be able to access (and secure) all your data.

# zpool import -o readonly=on -fnR /mnt gDisk. Followed by zfs list to check if it actually found any filesystems which you're going to have to mount manually.

Next time I wouldn't immediately try -F, instead use -Fn; that way you don't risk corrupting the pool even further (due to that constant writing).
I tried the freenas forum but no one replied there.

Using your command, I get the message “-n or -X only meaningful with -F”
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

What messages about the disks are there in dmesg or /var/log/messages? Can you search /var/log/messages back to when the PSU actually failed, if there is nothing obvious recently? There must be a cause for the drives to have I/O errors or go offline, and the error messages will probably tell us.
Unfortunately the log doesn’t as far back anymore. Is there anyway to get it back?

The recent messages don’t mention anything wrong on the hardware side besides the messages I’ve quoted in my initial post
 

ralphbsz

Daemon

Reaction score: 866
Messages: 1,395

Unfortunately the log doesn’t as far back anymore. Is there anyway to get it back?
Really? Usually, /var/log/messages and its compressed versions messages.[0-9].bz2 go back quite a while; depending on how many messages you are getting, typically many days.

The recent messages don’t mention anything wrong on the hardware side besides the messages I’ve quoted in my initial post
That only has a message from ZFS, saying that something is wrong underneath. No details, which might help with diagnosing this.
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

Really? Usually, /var/log/messages and its compressed versions messages.[0-9].bz2 go back quite a while; depending on how many messages you are getting, typically many days.


That only has a message from ZFS, saying that something is wrong underneath. No details, which might help with diagnosing this.
The log goes back to December 6th but the crash happens mid November and there is no compressed version in /var/log/
 

ShelLuser

Son of Beastie

Reaction score: 1,669
Messages: 3,512

Re-reading the first post I noticed raidz1-0 which seems rather ironic considering that, according to the manualpage, it would stripe both data and parity across all devices. Anyway, at this point I'm afraid your only option is a backup.

However...

Warning: the following is highly destructive advice, you will probably ensure the loss all your data. So don't try this without checking up on other possible solutions first.

ZFS allows you to import destroyed pools. Also interesting is that when you destroy a pool then the metadata of that pool remains as-is, this is also why commands such as labelclear exist. So when all else fails (!!) you could try to destroy the pool then try to import it again using -D, this will most likely fail though.

So, the next step is to try and recreate the same pool with the same settings on the same devices and then see if any of your data is somehow still accessible. You'd need to force creation, and I'd also use -d to prevent any features from getting created (thus possibly overwriting important parts of your pool).

I can't stress this out enough: this will most likely fail. The only reason I'm suggesting this is because when experimenting with ZFS pools on a live server I once did the same thing and could still access my data afterwards. Of course the major difference is that my pool wasn't damaged, but I destroyed it, re-created it and afterwards could still access the data.
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

Re-reading the first post I noticed raidz1-0 which seems rather ironic considering that, according to the manualpage, it would stripe both data and parity across all devices. Anyway, at this point I'm afraid your only option is a backup.

However...

Warning: the following is highly destructive advice, you will probably ensure the loss all your data. So don't try this without checking up on other possible solutions first.
I think i’m at that point now. Do you have the exact command I need to execute?
 

VladiBG

Aspiring Daemon

Reaction score: 209
Messages: 509

Then you can actually try to import the pool without -n option and see if it's imported.
zpool import -fFN gDisk
zpool status

Code:
-F     Initiates recovery mode for an    unopenable pool. Attempts to
         discard the last few transactions in the pool to return it to
         an openable state. Not    all damaged pools can be recovered by
         using this option. If successful, the data from the discarded
         transactions is irretrievably lost.
-N     Import    the pool without mounting any file systems.
-f     Forces    import,    even if    the pool appears to be potentially
         active.
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

Then you can actually try to import the pool without -n option and see if it's imported.
zpool import -fFN gDisk
zpool status

Code:
-F     Initiates recovery mode for an    unopenable pool. Attempts to
         discard the last few transactions in the pool to return it to
         an openable state. Not    all damaged pools can be recovered by
         using this option. If successful, the data from the discarded
         transactions is irretrievably lost.
-N     Import    the pool without mounting any file systems.
-f     Forces    import,    even if    the pool appears to be potentially
         active.
Same thing. I’ve already tried all of those commands.

714B009A-B8AB-43CD-AE88-E487E4A4F2A9.jpeg
 

ShelLuser

Son of Beastie

Reaction score: 1,669
Messages: 3,512

I think i’m at that point now. Do you have the exact command I need to execute?
I might have something a bit more reliable, though still potentially destructive.

Warning: don't try these commands "just like that" because they can (and probably will) cause damage!
(leaving the disclaimer in to prevent random readers from running into accidents)

Boot with that rescue system, check that the pool is still detected: # zpool import.

Then: # zdb -F gDisk, if possible please share the output.

Depending on the outcome... try to mount the pool again (readonly) with the command I shared earlier. Of course don't bother if you get massive error messages which tell you that the whole thing failed.

If you ran into errors you could try this: # zdb -X gDisk. This more or less does the same thing, yet tries to roll back the changes even further back.

I'd start here.
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

I might have something a bit more reliable, though still potentially destructive.

Warning: don't try these commands "just like that" because they can (and probably will) cause damage!
(leaving the disclaimer in to prevent random readers from running into accidents)

Boot with that rescue system, check that the pool is still detected: # zpool import.

Then: # zdb -F gDisk, if possible please share the output.

Depending on the outcome... try to mount the pool again (readonly) with the command I shared earlier. Of course don't bother if you get massive error messages which tell you that the whole thing failed.

If you ran into errors you could try this: # zdb -X gDisk. This more or less does the same thing, yet tries to roll back the changes even further back.

I'd start here.
What rescue system are you referring to?

If I run the command in the shell, I get :
Code:
zdb: can't open 'gDisk': No such file or directory

ZFS_DBGMSG(zdb):
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

Hi all,

First of all, happy new year. I’ve been away for the holidays but now i’m back so i’m bumping this thread.

Thank you in advance for your help
 

nihr43

Member

Reaction score: 19
Messages: 45

Things I would do:
dd or ddrescue the disks and experiment with the files instead of the disks.
definitely try importing on a fresh installation of FreeBSD.
I would also be trying to import in linux and illumos - just because. might get lucky with slighty different versions of zfs.
ask a friend if you can plug your drives into their desktop and boot from usb.
Rule out literally everything but the pool.
 
OP
OP
G

Gpop

New Member

Reaction score: 1
Messages: 15

Things I would do:
dd or ddrescue the disks and experiment with the files instead of the disks.
definitely try importing on a fresh installation of FreeBSD.
I would also be trying to import in linux and illumos - just because. might get lucky with slighty different versions of zfs.
ask a friend if you can plug your drives into their desktop and boot from usb.
Rule out literally everything but the pool.
I've looked a bit at ddrescue and that could solve my issue of recovering the files since apparently the disks are healthy.

If I understand correctly, the best way is to copy one of the disk to a new one and then plug it into a computer that has booted into ddrescue and from there launch the tool to create an image on a new disk. Is that correct or am I missing something?

The output should be an image containing the files on the drive. Is that it?
 

nihr43

Member

Reaction score: 19
Messages: 45

Oh I'm just talking about ddrescue the command line tool. as in `pkg install ddrescue`.
the intent is that in case you decide to do something potentially damaging (import -X for example), you can always revert to the original situation and try something else, rather than making things worse. this would require a lot more storage space though.

an image of a single zfs disk is not going to "contain" contiguous files like an image of a fat32 disk would. with a simpler filesystem, yes, you can run things like photorec (packaged with testdisk) to literally scan an entire disk for files... With zfs, all the data has been somewhat malformed (split up into records, compressed, arranged into transaction groups, scattered across disks ), so that kind of thing doesn't work.
 
Top