Recover corrupted ZFS pool

sir_dog · Feb 16, 2014

Hello.
I've got a problem with root ZFS pool on my 10.0-STABLE AMD64 system (built from sources 3 or 4 days ago, now I can't look). Yesterday I had a kernel panic during the scrub process on this pool, after that I'm not able neither to boot from it, nor to import it from the other system - it shows the error message "zfs: allocating allocated segment" and then system panicing. I can't write complete error message, it runs away from screen too fast, but I can try to make a photo.

I checked my RAM and found that one of the modules contains errors - so, probably, it was something wrong written from memory to pool metadata and caused corruption of the pool.
I tried to boot from last STABLE FreeBSD Live-CD with setting vfs.zfs.recovery=1 in boot console and import the pool with the following options: zpool import -f -N -o readonly=on rpool, but it causes the same panic.

After that I tried to boot from Oracle Solaris 11.1, set zfs_recover 1 via mdb -kw and then import the pool with the same command, but recent Solaris unable to work with ZFS pool version from FreeBSD. Recent OpenIndiana recognized the pool but paniced when I tried to import it like FreeBSD did.

I have a lot very important information on this disk and most of it isn't backed up (my fault!). Is there any way to correct this pool or even grab some data from it?

devil_devil · Feb 16, 2014

Hi @sir_dog,

Try to boot from OpenSolaris snv_111b and use these commands

Code:

echo "aok/W 1" | mdb -kw
echo "zfs_recover/W 1" | mdb -kw

and then you can try to perform import.

sir_dog · Feb 16, 2014

Thank you, @devil_devil,
I executed commands that you adviced and this allowed me to import my pool, but system crashed after a few minutes after that. I tried to do zdb -e -bcsvL zroot (zroot is the name of my pool), but the command aborted:

Code:

root@openindiana:~# zdb -e -bcsvL zroot

Traversing all blocks to verify checksums ...

assertion failed for thread 0xfffffd7fff162a40, thread-id 1: 0 == bptree_iterate(spa->spa_meta_objset, spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb, &zcb, 0L) (0x0 == 0x32), file ../zdb.c, line 2400
Abort (core dumped)

devil_devil · Feb 16, 2014

Ok, did you try to scrub your imported tank. Try to identify the corrupted hard drive.

Code:

zpool scrub tank_name

sir_dog · Feb 16, 2014

Unfortunately I can't do it. I can import my pool only in readonly mode which doesn't allow me to run zpool scrub. System panices after every attempt to import this pool in normal mode. And I can't even debug it, because the only hard drive I have contains this unlucky pool, so I can boot from Live-CD or Live-USB only and after every panic system logs disappear.

devil_devil · Feb 16, 2014

Try to scan this disk for bad blocks. You can use some third part software like Hirens I am almost sure that there are. If I am right you will be able to fix them from Hirens Boot CD.

sir_dog · Feb 17, 2014

I tried to scan disk either with Victoria for Windows (universal software for HDD examining from third party developer) and with the utility from WD (it's my HDD vendor) and found no physical errors on my HDD. So I still tend to think that it was an error in the work of RAM module, that causes the pool's metadata corruption.