Cannot import RAIDZ after power failure -- metadata corrupted

Hello and Happy New Year,

Following a power failure I have been unable to mount my ZFS raidz pool. I was able to export the pool, but subsequent "import -f" attempts have failed. I am looking for advice on how to correct the corruption and get the pool back.

CONFIGURATION: I am running FreeBSD 8.2-RELEASE-p3 as an ESXi virtual machine. I am ready and able to switch to a different FreeBSD or OpenSolaris version for the purpose of this troubleshooting. The raidz pool is comprised of 8 x 2TB drives. Each disk has an empty 2GB slice (p1), followed by a 1.8TB slice (p2) for use by ZFS.

PROBLEM: Following a brief power-off of the hard disks' enclosure, the pool cannot be mounted.

Code:
#zpool import
  pool: ZFSstore
    id: 12422265870499905405
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-72
config:

        ZFSstore    FAULTED  corrupted data
          raidz1    ONLINE
            da1p2   ONLINE
            da2p2   ONLINE
            da3p2   ONLINE
            da4p2   ONLINE
            da5p2   ONLINE
            da6p2   ONLINE
            da7p2   ONLINE
            da8p2   ONLINE

Code:
#zpool import -f ZFSstore
internal error: Illegal byte sequence
Abort

Code:
#zdb
ZFSstore
    version=15
    name='ZFSstore'
    state=0
    txg=1219414
    pool_guid=12422265870499905405
    hostid=2624045851
    hostname=''
    vdev_tree
        type='root'
        id=0
        guid=12422265870499905405
        children[0]
                type='raidz'
                id=0
                guid=9661925766897747075
                nparity=1
                metaslab_array=23
                metaslab_shift=37
                ashift=9
                asize=15985973133312
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=1621383351031684258
                        path='/dev/da1p2'
                        whole_disk=0
                        DTL=102
                children[1]
                        type='disk'
                        id=1
                        guid=4244177242659624564
                        path='/dev/da2p2'
                        whole_disk=0
                        DTL=101
                children[2]
                        type='disk'
                        id=2
                        guid=14313008757061126085
                        path='/dev/da3p2'
                        whole_disk=0
                        DTL=100
                children[3]
                        type='disk'
                        id=3
                        guid=8151646539787060344
                        path='/dev/da4p2'
                        whole_disk=0
                        DTL=99
                children[4]
                        type='disk'
                        id=4
                        guid=12023942581224222406
                        path='/dev/da5p2'
                        whole_disk=0
                        DTL=98
                children[5]
                        type='disk'
                        id=5
                        guid=4292570899629833601
                        path='/dev/da6p2'
                        whole_disk=0
                        DTL=97
                children[6]
                        type='disk'
                        id=6
                        guid=14307962413307638883
                        path='/dev/da7p2'
                        whole_disk=0
                        DTL=96
                children[7]
                        type='disk'
                        id=7
                        guid=5832102164715480864
                        path='/dev/da8p2'
                        whole_disk=0
                        DTL=95

The output of "#gpart list" is too long for this post. I will post that output in an immediate follow-up.

I've researched the problem myself for a while. The TS here had the same problem resulting from a power failure, and the same zpool status. The advice given was that it was likely an error with "labeling, dangling device links, or overlapping partitions." Based on that info, TS reported he fixed the problem:
I relabeled the disks using the partition information I was able to get from the FreeBSD Live CD, and then was able to import/repair the zpool using the latest OpenSolaris Live CD.

I am writing here because I am still a novice at FreeBSD. I know only enough about labels to know they should not be messed with lightly. I know nothing about "dangling device links" and "overlapping partitions".

Any advice you can offer about how to proceed in a manner that won't destroy my 14TB pool is greatly, greatly appreciated.

Thank you very much in advance.
 
As mentioned, here is the output of gpart list

Code:
#gpart list

Geom name: da0
state: OK
fwheads: 255
fwsectors: 63
last: 16777214
first: 63
entries: 4
scheme: MBR
Providers:
1. Name: da0s1
   Mediasize: 988291584 (943M)
   Sectorsize: 512
   Mode: r1w0e2
   attrib: active
   rawtype: 165
   length: 988291584
   offset: 32256
   type: freebsd
   index: 1
   end: 1930319
   start: 63
2. Name: da0s2
   Mediasize: 988291584 (943M)
   Sectorsize: 512
   Mode: r0w0e0
   rawtype: 165
   length: 988291584
   offset: 988356096
   type: freebsd
   index: 2
   end: 3860639
   start: 1930383
3. Name: da0s3
   Mediasize: 1548288 (1.5M)
   Sectorsize: 512
   Mode: r0w0e0
   rawtype: 165
   length: 1548288
   offset: 1976647680
   type: freebsd
   index: 3
   end: 3863663
   start: 3860640
4. Name: da0s4
   Mediasize: 21159936 (20M)
   Sectorsize: 512
   Mode: r1w1e2
   rawtype: 165
   length: 21159936
   offset: 1978195968
   type: freebsd
   index: 4
   end: 3904991
   start: 3863664
Consumers:
1. Name: da0
   Mediasize: 8589934592 (8.0G)
   Sectorsize: 512
   Mode: r2w1e5

Geom name: da0s1
state: OK
fwheads: 255
fwsectors: 63
last: 1930256
first: 0
entries: 8
scheme: BSD
Providers:
1. Name: da0s1a
   Mediasize: 988283392 (943M)
   Sectorsize: 512
   Mode: r1w0e2
   rawtype: 0
   length: 988283392
   offset: 8192
   type: !0
   index: 1
   end: 1930256
   start: 16
Consumers:
1. Name: da0s1
   Mediasize: 988291584 (943M)
   Sectorsize: 512
   Mode: r1w0e2

Geom name: da1
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da1p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d7266a00-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da1p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d72e4077-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da1
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: da2
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da2p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d751cfdc-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da2p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d75a6e63-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da2
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: da3
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da3p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d77bd47a-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da3p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d78335e1-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da3
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: da4
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da4p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d7a67ec2-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da4p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d7aeb9c8-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da4
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: da5
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da5p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d7dbfbe1-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da5p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d7eb72ae-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da5
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: da6
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da6p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d83e57e1-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da6p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d84eb60c-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da6
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: da7
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d8878434-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d8964ea3-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
  offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: da8
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da8p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d8d26302-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da8p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   rawuuid: d8e25e65-fdc4-11e0-bcfc-000c291d584e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da8
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0

Thanks again.
 
LSDave said:
The output of "#gpart list" is too long for this post. I will post that output in an immediate follow-up.
If it's that long, please use pastebin or a similar service and post the link to it.
 
gpart(8) show output would probably be better. If you used GPT labels, use the -l option.
% gpart show -l

And please get that array on a good UPS.
 
Thanks wblock for your prompt reply. Please find the corrected* output below.

Code:
#gpart show
=>      63  16777152  da0  MBR  (8.0G)
        63   1930257    1  freebsd  [active]  (943M)
   1930320        63       - free -  (32K)
   1930383   1930257    2  freebsd  (943M)
   3860640      3024    3  freebsd  (1.5M)
   3863664     41328    4  freebsd  (20M)
   3904992  12872223       - free -  (6.1G)

=>      0  1930257  da0s1  BSD  (943M)
        0       16         - free -  (8.0K)
       16  1930241      1  !0  (943M)

=>        34  3907029101  da1  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

=>        34  3907029101  da2  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

=>        34  3907029101  da3  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

=>        34  3907029101  da4  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

=>        34  3907029101  da5  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

=>        34  3907029101  da6  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

=>        34  3907029101  da7  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

=>        34  3907029101  da8  GPT  (1.8T)
          34          94       - free -  (47K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

/EDIT

As for a UPS, I am embarrassed to say it was already connected to one! It's a long story... If I get my pool back online, I will be in a sufficiently self-deprecating mood to share it with you. :)
 
EDIT: Responded too quickly originally, the same command with the -l switch shows the labels only as "(null)".

Thanks for your assistance and patience.

Dave
 
LSDave said:
CONFIGURATION: I am running FreeBSD 8.2-RELEASE-p3 as an ESXi virtual machine. I am ready and able to switch to a different FreeBSD or OpenSolaris version for the purpose of this troubleshooting.

Try installing FreeBSD 9.1-RELEASE in a different vm and make the disks available. Then you can import the pool. If this still fails you can try with the -F (capital) switch.

If you manage to import this with -F then it will probably not work again with FreeBSD 8.2 which is EOL anyway.
 
gkontos said:
Try installing FreeBSD 9.1-RELEASE in a different vm and make the disks available. Then you can import the pool. If this still fails you can try with the -F (capital) switch.

If you manage to import this with -F then it will probably not work again with FreeBSD 8.2 which is EOL anyway.

Thanks for your reply and help. I attempted the following with FreeBSD 9.1-Release Live CD, running in a VM:

Code:
#zpool import -F ZFSstore
cannot import 'ZFSstore': pool may be in use from other system, it was last accessed by [a hostname] on Wed Jan 2 07:28
:53 2013
use '-f' to import anyway

#zpool import -f ZFSstore
cannot import 'ZFSstore': I/O error
[INDENT]Destroy and re-create the pool from
a backup source[/INDENT]

I also tried:

Code:
#zpool clear -F ZFSstore
cannot open 'ZFSstore': no such pool

Given this thread, and the fact that that poster's accident and outcome are identical to mine, I believe the solution may lie in re-labeling. Anyone have any ideas?
 
According to the man page,


Code:
         -F      Recovery mode for a non-importable pool. Attempt to return
                 the pool to an importable state by discarding the last few
                 transactions. Not all damaged pools can be recovered by using
                 this option. If successful, the data from the discarded
                 transactions is irretrievably lost. This option is ignored if
                 the pool is importable or already imported.

         -n      Used with the -F recovery option. Determines whether a non-
                 importable pool can be made importable again, but does not
                 actually perform the pool recovery. For more details about
                 pool recovery mode, see the -F option, above.

so you can try

# zpool -Fn ZFSstore

to determine whether the pool can be imported or not. If this is possible, you can then run

# zpool -fF ZFSstore
 
t1066 said:
According to the man page,


Code:
         -F      Recovery mode for a non-importable pool. Attempt to return
                 the pool to an importable state by discarding the last few
                 transactions. Not all damaged pools can be recovered by using
                 this option. If successful, the data from the discarded
                 transactions is irretrievably lost. This option is ignored if
                 the pool is importable or already imported.

         -n      Used with the -F recovery option. Determines whether a non-
                 importable pool can be made importable again, but does not
                 actually perform the pool recovery. For more details about
                 pool recovery mode, see the -F option, above.

so you can try

# zpool -Fn ZFSstore

to determine whether the pool can be imported or not. If this is possible, you can then run

# zpool -fF ZFSstore

Thank you for that. Those commands looked very promising, but I found that both fail. I don't have the exact code right now, but I recall that with "-fF", the command refused and invited me to try just "import -f". Trying with just "-f" failed with the usual error
Code:
cannot import 'ZFSstore': I/O error
    Destroy and re-create the pool from a backup source

I am presently running #zpool -fFXn ZFSstore on the pool.

Apparently it takes several hours, but eventually reports on the possibility to rollback the pool to a past date. I understand that the same command without the "-n" writes the rollback. The pool's not been written to since the power failure a week ago, and newer data before that I still have backups of on my network. [fingers crossed]
 
#zpool -fFXn ZFSstore ran for 15 hours, then just concluded with no report, like this

Code:
#zpool -fFXn ZFSstore
#

I've read more, and now believe that I need to find a working uberblock (using zdb) and attempted a rollback specifically naming that uberblock. I don't really know how to do that...still reading.

The more I read, the more I believe that my data is still intact and can be recovered. It's about knowing how to sniff around the pool and how to use the undocumented ZFS recovery tools to force zpool import to look past the latest corruption to an earlier working state. Any tips would be appreciated.
 
It seems that HERE article about what you talking about (restore ZFS with some block etc..) but it in Russian, so try some translation service :)
 
G_Nerc said:
It seems that HERE article about what you talking about (restore ZFS with some block etc..) but it in Russian, so try some translation service :)

Towards the end of that article the author refers to the -F option of the zpool and recommends it instead of his own approach of reverting transactions which corrupted meta-data (which is sort of self-explanatory from the pieces of code).
 
Back
Top