ZFS Need help to recover data from damaged ZFS pool

Hello. I am facing an issue like this.
I lost my zfs pool from trueNAS 12 then I attached this disk on a FreeBSD 13 fresh install.

I am able to see the pool using: zpool import -f but I cant recover it, I got this error:
Code:
Mar 1 01:15:02 Cofre syslogd: last message repeated 4 times
Mar 1 01:15:21 Cofre login[82183]: ROOT LOGIN (root) ON ttyv0
Mar 1 01:24:10 Cofre syslogd: kernel boot file is /boot/kernel/kernel
Mar 1 01:24:10 Cofre kernel: [558] panic: VERIFY3(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed (0 == 5)
Mar 1 01:24:10 Cofre kernel: [558]
Mar 1 01:24:10 Cofre kernel: [558] cpuid = 3
Mar 1 01:24:10 Cofre kernel: [558] time = 1646108641
Mar 1 01:24:10 Cofre kernel: [558] KDB: stack backtrace:
Mar 1 01:24:10 Cofre kernel: [558] #0 0xffffffff8099fd15 at kdb_backtrace+0x65
Mar 1 01:24:10 Cofre kernel: [558] #1 0xffffffff80951d81 at vpanic+0x181
Mar 1 01:24:10 Cofre kernel: [558] #2 0xffffffff81aa904a at spl_panic+0x3a
Mar 1 01:24:10 Cofre kernel: [558] #3 0xffffffff81b0cac2 at dmu_write+0x62
Mar 1 01:24:10 Cofre kernel: [558] #4 0xffffffff81b98364 at space_map_write+0x194
Mar 1 01:24:10 Cofre kernel: [558] #5 0xffffffff81b65667 at metaslab_flush+0x3b7
Mar 1 01:24:10 Cofre kernel: [558] #6 0xffffffff81b8f759 at spa_flush_metaslabs+0x1a9
Mar 1 01:24:10 Cofre kernel: [558] #7 0xffffffff81b85fed at spa_sync+0xd6d
Mar 1 01:24:10 Cofre kernel: [558] #8 0xffffffff81b9a483 at txg_sync_thread+0x3b3
Mar 1 01:24:10 Cofre kernel: [558] #9 0xffffffff8090f5ce at fork_exit+0x7e
Mar 1 01:24:10 Cofre kernel: [558] #10 0xffffffff80cdd40e at fork_trampoline+0xe
Mar 1 01:24:10 Cofre kernel: [558] Uptime: 9m18s
Mar 1 01:24:10 Cofre kernel: ---<<BOOT>>---


root@Cofre:/var/log # zdb -l /dev/da1p2
------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'boot-pool'
state: 0
txg: 7895779
pool_guid: 1525392888318996755
errata: 0
hostname: ''
top_guid: 1561236375997801541
guid: 1561236375997801541
vdev_children: 1
vdev_tree:
type: 'disk'
id: 0
guid: 1561236375997801541
path: '/dev/da0p2'
whole_disk: 1
metaslab_array: 64
metaslab_shift: 30
ashift: 12
asize: 136075280384
is_log: 0
DTL: 150
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
labels = 0 1 2 3
root@Cofre:/var/log #

root@Cofre:/mnt # zpool import -R /mnt/truenas/ boot-pool
cannot import 'boot-pool': pool was previously in use from another system.
Last accessed by <unknown> (hostid=0) at Sat Feb 26 16:34:28 2022
The pool can be imported, use 'zpool import -f' to import the pool.
root@Cofre:/mnt #

oot@Cofre:/mnt # smartctl -a -q noserial /dev/da1 -T permissive
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.0-STABLE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, [URL='http://www.smartmontools.org/']www.smartmontools.org[/URL]

=== START OF INFORMATION SECTION ===
Vendor: Msft
Product: Virtual Disk
Revision: 1.0
Compliance: SPC-3
User Capacity: 136,365,211,648 bytes [136 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is thin provisioned, LBPRZ=0
>> Terminate command early due to bad response to IEC mode page

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature: 0 C
Drive Trip Temperature: 0 C

Error Counter logging not supported

Device does not support Self Test logging
root@Cofre:/mnt #

Any suggestions? Thanks
 
Which version of FreeBSD, exactly?

freebsd-version -kru ; uname -aKU

What exactly happens when, or after, you apply force? The panic?

E.g. (not in the opening post, but based upon what's there) with option -f,

zpool import -f -R /mnt/truenas/ boot-pool

<https://openzfs.github.io/openzfs-docs/man/8/zpool-import.8.html>

Have you tried a forced import without mounting?
root@Cofre:/mnt # freebsd-version -kru ; uname -aKU
13.0-STABLE
13.0-STABLE
13.0-RELEASE-p7
FreeBSD Cofre 13.0-STABLE FreeBSD 13.0-STABLE #0 stable/13-n249459-9ddf1ab1b9c: Mon Feb 14 14:32:25 -03 2022 root@Cofre.mydomain.com.br:/usr/obj/usr/src/amd64.amd64/sys/TSI amd64 1300525 1300139
root@Cofre:/mnt #


I got the panic if I run: zpool import -f boot-pool

root@Cofre:/mnt # zpool import -f
pool: boot-pool
id: 1525392888318996755
state: ONLINE
status: Some supported features are not enabled on the pool.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

boot-pool ONLINE
da1p2 ONLINE
root@Cofre:/mnt #
 
root@Cofre:~ # zpool import -o readonly=on
pool: boot-pool
id: 1525392888318996755
state: ONLINE
status: Some supported features are not enabled on the pool.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

boot-pool ONLINE
da1p2 ONLINE
root@Cofre:~ #


Do you mean: zpool import -o readonly=on boot-pool ?
 
Avoid clobber. Don't forget the altroot.

13.0-STABLE
13.0-STABLE
13.0-RELEASE-p7

Is that a weird combination?

Here (not recently updated, but good enough for a comparison):

1646160770377.png

More generally: for a recovery situation, I'd aim for a RELEASE kernel, unless there's a good reason to avoid it (e.g. RELEASE not booting the computer).
 
root@Cofre:~ # zpool import -o readonly=on -f -R /mnt/truenas/ boot-pool
root@Cofre:~ # ls /mnt/truenas/
root@Cofre:~ #
root@Cofre:~ # zpool status
pool: boot-pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:10 with 1 errors on Thu Feb 24 03:45:10 2022
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
da1p2 ONLINE 0 0 0

errors: 1 data errors, use '-v' for a list

pool: zroot
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
da0p4 ONLINE 0 0 0

errors: No known data errors
root@Cofre:~ #
 
root@Cofre:~ # zpool status -xv
pool: boot-pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:10 with 1 errors on Thu Feb 24 03:45:10 2022
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
da1p2 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

boot-pool/.system/rrd-17a88b624aa64e01b68939563b5fcb8c:/localhost/df-mnt-tank-data-volumes-mycompany-stable-rollbacks-stable-d587c/df_complex-free.rrd
root@Cofre:~
 
root@Cofre:~ # zpool import -o readonly=on
...
boot-pool ONLINE
It seems to me that you succeeded in importing it. At this point, it seems that the pool is readable (at least you got to online). So I would go with SirDice's advice, but instead of restoring your data from a backup, I would read the data from the damaged pool, and copy what you want to a new place.

Obviously, this immediately leads to a followup question: What damaged the original? We have no idea, but ...

I lost my zfs pool from trueNAS 12 then I attached this disk on a FreeBSD 13 fresh install.
Remember, this is a FreeBSD forum. We don't like to answer questions about TrueNAS here, because most of us have zero experience with TrueNAS, and don't know what it does. I would suspect that TrueNAS 12 runs a ZFS version that is older than FreeBSD 13; that's because OpenZFS in FreeBSD 13 is a relatively recent addition. Also, the line "Some supported features are not enabled on the pool" indicates that.

Asking here (in this forum) what caused TrueNAS to damage this ZFS pool is sort of a silly question.

# smartctl -a -q noserial /dev/da1 -T permissive
... leads to errors ...
This is not a real disk. It is some sort of emulated disk. For lack of experience with Microsoft products, I don't know whether you are running this on Azure, or using a VM on a Windows server. But it makes sense that you can't diagnose an emulated, virtual disk. If you have a reason to suspect that the physical hardware disk is defective, run smartctl (or the Windows equivalent) on the OS the physical disk is actually attached to.
 
Hello ralphbsz. Thanks for your reply.
Unfortunately I had no backup from this pool, we only configured it to backup data pool.

I want to access this pool to get some config files from our TrueNAS.
 
root@Cofre:/mnt # zfs set mountpoint=/mnt/truenas boot-pool
cannot set property for 'boot-pool': pool is read-only
root@Cofre:/mnt #
 
root@Cofre:/mnt # mount -t zfs -o zfsutil boot-pool /mnt/truenas/
root@Cofre:/mnt # cd /mnt/truenas/
root@Cofre:/mnt/truenas # ls
root@Cofre:/mnt/truenas # mount
zroot/ROOT/default on / (zfs, local, noatime, nfsv4acls)
devfs on /dev (devfs)
/dev/da0p1 on /boot/efi (msdosfs, local)
zroot/tmp on /tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot/var/log on /var/log (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/usr/ports on /usr/ports (zfs, local, noatime, nosuid, nfsv4acls)
zroot/var/crash on /var/crash (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/audit on /var/audit (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot on /zroot (zfs, local, noatime, nfsv4acls)
zroot/usr/home on /usr/home (zfs, local, noatime, nfsv4acls)
zroot/usr/src on /usr/src (zfs, local, noatime, nfsv4acls)
zroot/var/tmp on /var/tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot/var/mail on /var/mail (zfs, local, nfsv4acls)
boot-pool on /mnt/truenas (zfs, local, noatime, read-only, nfsv4acls)
root@Cofre:/mnt/truenas #
 
root@Cofre:/mnt # mount -t zfs -o zfsutil boot-pool /mnt/truenas/
That should probably have done it. Personally, I always use "zfs mount" instead of "mount -t zfs", and I actually don't know what the difference is. Try the equivalent "zfs mount" command.

root@Cofre:/mnt/truenas # ls
That tells me that there is nothing VISIBLE here. You should try "ls -a", in case all the files are in invisible subdirectories (those whose names start with a "."). But if there nothing invisible here either, then it seems that this ZFS file system was damaged to the point that no files remain in it.
 
I coulnd´t mout it using zfs mount because it is read only.
Yes, it seems i lost all data from that disk.

root@Cofre:/mnt/truenas # df -h
Filesystem Size Used Avail Capacity Mounted on
zroot/ROOT/default 116G 3.8G 112G 3% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/da0p1 260M 1.8M 258M 1% /boot/efi
zroot/tmp 112G 104K 112G 0% /tmp
zroot/var/log 112G 468K 112G 0% /var/log
zroot/usr/ports 112G 96K 112G 0% /usr/ports
zroot/var/crash 112G 96K 112G 0% /var/crash
zroot/var/audit 112G 96K 112G 0% /var/audit
zroot 112G 96K 112G 0% /zroot
zroot/usr/home 112G 30M 112G 0% /usr/home
zroot/usr/src 114G 2.2G 112G 2% /usr/src
zroot/var/tmp 112G 104K 112G 0% /var/tmp
zroot/var/mail 112G 140K 112G 0% /var/mail
boot-pool 119G 96K 119G 0% /mnt/truenas
root@Cofre:/mnt/truenas #
 
root@Cofre:/mnt/truenas # ls -la
total 1
drwxr-xr-x 2 root wheel 2 Mar 1 13:29 .
drwxr-xr-x 3 root wheel 3 Mar 1 13:29 ..
root@Cofre:/mnt/truenas #
 
When last used normally (with TrueNAS) did it include all, or part, of what's normally in a TrueNAS boot pool?
My TrueNAS had:
- disk1 127 GB to system;
- disk2 500 GB to my data;

I had no backup from TrueNAS configurations and due to this disk1 error I had to reconfigure my enviroment at a brand new installation.
 
… disk1 127 GB to system …

Thanks. What type of storage (what medium)?

Can we triple-check that your zpool(8) command for the most recent import included the -R option?

Don't forget the altroot.

My gut feeling, you should:
  1. export the pool
  2. restart the 13.0-STABLE host
  3. hope that the 13.0-RELEASE-p7 userland (a mismatch?) does not have an adverse impact
  4. zpool import -f -o readonly=on -R /mnt/truenas/ boot-pool
  5. zpool list boot-pool
  6. zfs list -r boot-pool
  7. cd /mnt/truenas/ && du -h .
 
Back
Top