ZFS errors: Permanent errors have been detected in the following files: <metadata>:<0x1e6>

Hello,
Even after scrub this remains. Is it bad, and if so how bad? is it fixable?


Code:
root@vmbsd:/usr/home/pete # zpool status zroot
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:26 with 1 errors on Sun Mar 17 05:47:55 2019
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          nvd0p2    ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list
root@vmbsd:/usr/home/pete # zpool status -v zroot
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:26 with 1 errors on Sun Mar 17 05:47:55 2019
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          nvd0p2    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x1e6>

Thanks
 
You can scrub all you want but your data will never be corrected. ZFS's error correction only works if there's redundancy of the data, i.e. RAID-Z, mirrors, etc. With a single disk or a striped set there is no redundancy so you can only have error detection. You can get data redundancy on single disks or striped sets if you set the copies property to 2 or higher (only applies to newly written data, existing data will not be updated).
 
SirDice answered your second question: Is it fixable? No, not automatically.

Your first question was: Is it bad, and how bad? I'll give you two answers on how to find out what is missing. If you have ample experience, you can use the ZFS debugger zdb(8) to find out exactly what "metadata 0x1e6" refers to, which might give you a hint of what is damaged: a directory, or inode-like data that describes file content. It might also just tell you that a metadata block is damaged, and you'll never find out what was in there before it was damaged, so you know that something is missing but you don't know what. This so far requires a considerable amount of ZFS expertise, way more than I have, and likely also more than you have. It is theoretically possible that you could find information in a partially damaged metadata block, but that's unlikely: the metadata block is a complicated data structure, and to manually decode it is hard. Manually fixing it is insanely hard (yes, most file system developers have done that at times, and it is insane).

The second answer is: Look at what remains, inventory it against what you know or remember what should be there. This would be a good time to look at your most recent backup (you religiously take backups regularly, right?), and compare the live disk to the backup, to get a hint to what is damaged.

By the way, that throwaway line about backups was not a joke. There are two kinds of computer users: those who always back up, and those who have not lost data YET.
 
SirDice answered your second question: Is it fixable? No, not automatically.

Your first question was: Is it bad, and how bad? I'll give you two answers on how to find out what is missing. If you have ample experience, you can use the ZFS debugger zdb(8) to find out exactly what "metadata 0x1e6" refers to, which might give you a hint of what is damaged: a directory, or inode-like data that describes file content. It might also just tell you that a metadata block is damaged, and you'll never find out what was in there before it was damaged, so you know that something is missing but you don't know what. This so far requires a considerable amount of ZFS expertise, way more than I have, and likely also more than you have. It is theoretically possible that you could find information in a partially damaged metadata block, but that's unlikely: the metadata block is a complicated data structure, and to manually decode it is hard. Manually fixing it is insanely hard (yes, most file system developers have done that at times, and it is insane).

The second answer is: Look at what remains, inventory it against what you know or remember what should be there. This would be a good time to look at your most recent backup (you religiously take backups regularly, right?), and compare the live disk to the backup, to get a hint to what is damaged.

By the way, that throwaway line about backups was not a joke. There are two kinds of computer users: those who always back up, and those who have not lost data YET.
Thanks for the reply.
What happened was, I had restored an image using "Clonezilla" I Haven't figures out how to restore using:
Code:
zfs snapshot -r zroot@backup
zfs send -Rv zroot@backup | gzip > /mnt/bhyve/vmbsdwdimg_zfs.gz
zfs destroy -r zroot@backup
Which brings me too, the initial errors when I did "zpool status zroot" after my Clonzilla restore:
It showed 17 file errors that seemed to be in "zfs snapshot -r zroot@backup" that I neglected to "zfs destroy -r zroot@backup" after I had backed up.
I then "zfs destroy -r zroot@backup" and "zpool status zroot" and was left with :
Code:
root@vmbsd:/usr/home/pete # zpool status zroot
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:26 with 1 errors on Sun Mar 17 05:47:55 2019
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          nvd0p2    ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list
root@vmbsd:/usr/home/pete # zpool status -v zroot
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:26 with 1 errors on Sun Mar 17 05:47:55 2019
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          nvd0p2    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x1e6>

I do not think anything is actually missing, at least not that I am aware of.
Will the damage spread or was it just in the "zfs snapshot -r zroot@backup" that I destroyed?
 
Removing the snapshot would also have removed the corrupted metadata, if the metadata was related to the snapshot.
 
A: As SirDice said, if you are lucky, the damaged metadata block is only used in the snapshot, and removing the snapshot will make the problem become irrelevant. Actually, you might be in general lucky: the corrupted metadata block might be in general unused (like only describe already deleted files, but we don't know that because we can't read it), in which case you don't actually have a problem.

B: No, there is no historical log of "zpool status" that I know of. In theory, an expert could figure out much of the history of the internals using zdb(8) and examining disk blocks. That's because ZFS uses a CoW (Copy on Write) method to put data structures (such as metadata) on disk, which means that older copies of metadata tend to remain on disk, but are unreferenced. By doing an exhaustive search of the disk and trying to manually link unreferenced blocks, one can sometimes (often? occasionally?) reconstruct the history. Doing this would be very hard and tedious.

C: You used clonezilla. Did you dismount ZFS first (probably with export followed by import)? Copying a live file system while it is being modified is a "foot shaped gun": Something that will nearly always result in self-inflicted injury. In general, copying at the block device layer underneath file systems should be avoided, unless you have a really good understanding of what the file system is doing, or you have it completely quiesced and the cache flushed.
 
Back
Top