ZFS zpool import panic - assert on zap_lookup ddt.c

Hi everybody,

After 2 weeks of great fight I will ask you for a help. I have ZFS/RaidZ storage w 3x3TB HDD that was corrupted badly by power failure. The situation w/ FreeBSD 10.1 kernel that is default boot kernel is panic - reboot loop. Actually it panics also CURRENT. What I made:

All the things are tested w/ 10.1 and CURRENT. Some of them mean-less but when you are desperate ...

zpool import w/ -fNFX -R ... -o readonly -> results panics in any combinations even setting vfs.zfs.recover=1

# zdb -ul /dev/ada0p2 | grep 'txg =' | sort | uniq | tail -20
Code:
txg = 8988462
txg = 8988479
txg = 8988483
txg = 8988487
txg = 8988491
txg = 8988495
txg = 8988499
txg = 8988503
txg = 8988507
txg = 8988508
Same on other disks. Find latest TXG that is shared from all 3 disks.
again: zpool import w/ -T included = panic

OK... Let's try something different.
# zdb -AAA -e tank0
Code:
Configuration for import:
vdev_children: 1
version: 5000
pool_guid: 13817282546572470543
name: 'tank0'
state: 0
hostid: 3383514280
hostname: 'ZHTPC'
vdev_tree:
type: 'root'
id: 0
guid: 13817282546572470543
children[0]:
type: 'raidz'
id: 0
guid: 4655771131126618685
nparity: 1
metaslab_array: 30
metaslab_shift: 34
ashift: 12
asize: 9001764126720
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 7170665694161086550
phys_path: '/dev/gptid/cca1c272-5904-11e4-ba53-94de806e9451'
whole_disk: 1
DTL: 669
create_txg: 4
path: '/dev/gptid/cca1c272-5904-11e4-ba53-94de806e9451'
children[1]:
type: 'disk'
id: 1
guid: 14199800870742853108
phys_path: '/dev/gptid/dfad89f0-5878-11e4-893f-94de806e9451'
whole_disk: 1
DTL: 668
create_txg: 4
removed: 1
path: '/dev/gptid/dfad89f0-5878-11e4-893f-94de806e9451'
children[2]:
type: 'disk'
id: 2
guid: 8242149665652459698
phys_path: '/dev/gptid/85bd104f-58c0-11e4-9fbe-94de806e9451'
whole_disk: 1
DTL: 667
create_txg: 4
path: '/dev/gptid/85bd104f-58c0-11e4-9fbe-94de806e9451'
Assertion failed: zap_lookup(ddt->ddt_os, ddt->ddt_spa->spa_ddt_stat_object, name, sizeof (uint64_t), sizeof (ddt_histogram_t) / sizeof (uint64_t), &ddt->ddt_histogram[type][class]) == 0 (0x6 == 0x0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 126.
Assertion failed: (ddt_object_info(ddt, type, class, &doi) == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 131.
Assertion failed: zap_lookup(ddt->ddt_os, ddt->ddt_spa->spa_ddt_stat_object, name, sizeof (uint64_t), sizeof (ddt_histogram_t) / sizeof (uint64_t), &ddt->ddt_histogram[type][class]) == 0 (0x6 == 0x0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 126.
Assertion failed: (ddt_object_info(ddt, type, class, &doi) == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 131.
zdb: can't open 'tank0': Input/output error
root@:~ # uname -a
FreeBSD 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r278908: Tue Feb 17 19:29:12 UTC 2015 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64

(first assert on 10.1 is on 127 line)
After some research:

zdb -AAA -F -X (-L) -e tank0 (both w/ and w/o -L tried)

For some reason randomly it runs. Let say few times you see asset above and suddenly it starts. A long wait ... and after finish: sync and zpool import and panic again.

I tried to remove stay on 2 disk combination. Same result.

Now I'm have no more ideas. Because pool is version 5000 I have no option to try Solaris.

Thanks for help.

P.S.

I'm amazed because this pool starts from 9.0 upgraded many times now on 10.1. 2 times disk upgrades from 500 to 1G to 3G and survives and now it dies just like that.
 
Last edited by a moderator:
I'm attaching backtrace also:

IMAG0393_3.jpg


I'm asking myself should I report it to mailing list as a bug ? Even it is abnormal to have metadata in this state zpool import should work with older transaction or zdb -F should fix this. After zdb finished his job reports that all data/snapshot and files are there ... but not fixes the issue.

Thanks
 
Back
Top