ZFS Zpool status shows data corruption on non-existent file

This shows a file in zroot/ROOT/default but zroot contains nothing. I don't know what that means or what I should do.

zpool status -xv pool: zroot state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: zroot/ROOT/default:<0x38030>

And here's the disk layout:

df -h Filesystem Size Used Avail Capacity Mounted on zroot/ROOT/default 124G 5.7G 119G 5% / devfs 1.0K 1.0K 0B 100% /dev fdescfs 1.0K 1.0K 0B 100% /dev/fd procfs 4.0K 4.0K 0B 100% /proc zroot/tmp 119G 9.7M 119G 0% /tmp zroot/var/log 119G 1.1M 119G 0% /var/log zroot/usr/home 203G 84G 119G 42% /usr/home zroot 119G 88K 119G 0% /zroot zroot/var/tmp 119G 196K 119G 0% /var/tmp zroot/var/crash 119G 88K 119G 0% /var/crash zroot/var/audit 119G 88K 119G 0% /var/audit zroot/usr/ports 121G 2.2G 119G 2% /usr/ports zroot/var/mail 119G 132K 119G 0% /var/mail zroot/usr/src 119G 770M 119G 1% /usr/src
 
According to the link Minbari provided,
"If the object number to a file path cannot be successfully translated, either due to an error or because the object doesn't have a real file path associated with it, as is the case for a dnode_t, then the dataset name followed by the object's number is displayed."

I'm guessing that's a BE based on the path. Is it the one that you are currently booted in? If not, then that may explain why it's just a number. Maybe if you use bectl/beadm to temp mount the BE named "default" and then rerun zpool status -v you may get the information you actually need.
 
I've not taken the time to study zfs in detail or learn bectl/beadm so I'm lost. I just did a basic install on this system a while back and haven't bothered with any of that.
 
Ok.
output of bectl list
basically I'm looking for more than one Boot Environment (BE), if there is more than one see which one is the active one.
 
bectl list BE Active Mountpoint Space Created 13.1-RELEASE-p1_2022-09-01_042939 - - 1.03G 2022-09-01 04:29 13.1-RELEASE_2022-08-10_142208 - - 1.59G 2022-08-10 14:22 default NR / 9.12G 2020-10-17 19:05

So it looks like I have two? Maybe I messed up an upgrade?
 
This is not an error due to whatever you did. Could be either HW related or even bug related. With non-ECC ram most likely HW related.
In the current BE("default") run this as root: find / -type f -exec ls -al {} \; > /dev/null and find / -type d -exec ls -lad {} \; > /dev/null. It's poor man's approach to read every inode to see if you can trigger what is corrupted. It still can be data (and not metadata) but it's a start.

Someone sharp with zdb(8) could probably share better approach here.
 
_martin The second find immediately popped up with this (but the first find has not completed yet):
ls: /proc/66844: No such file or directory
find: /proc/66844: No such file or directory


EDIT: Some time back, I was helping someone with their Windows computer and got in the habit of turning the computer off using the power switch. One file was corrupted but I was able to fix that using zfs tools but I'd have to find the post on this forum where I did that.

EDIT2: The first 'find' returned nothing. Re-running the second 'find' returns the same but a different number.
 
You can discard any errors regarding /proc, that's just pseudo filesystem (most likely pid 66844 was terminated during the time list was created by find and actual access was done). So search returned nothing. Maybe this error is regarding the zombie file, i.e. it's a stale entry.
Yeah, massaging FS with sudden power off is not ideal. Btw. did you run zpool scrub zroot on it already?
 
When you run the find, start it at the root of the afflicted ZFS file system, and use the command-line switch to not allow it to go into different file systems (which is something like -xdev, but please check). Like that you avoid pseudo file systems like /proc and /dev, and file systems of different types, such as probably /tmp/.
 
ralphbsz That's why I mentioned poor man's approach. Issue is on /. From the output he shared we see it's not that big disk. It's easier to run it this way and discard false positives. I think I'll never memorize find's prune option, always waste way too much time on reading man and googling on that. And then I forget it again. Same goes with cpio, I just can't get that into my head.

drhowarddrfine Yes. Some transient errors are even fixed by 2nd pass of scrub (wait for first to finish and execute again).
 
I'm away form my office at the moment, so all I had to investigate was a VirtualBox VM running FreeBSD 13.1-RELEASE-p2 GENERIC on my notebook
Code:
[f13.172] $ sudo zpool status -xv
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:04:39 with 0 errors on Wed Apr 20 17:24:06 2022
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       ONLINE       0     0     0
      ada0p3    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        //usr/local/share/locale/uk/LC_MESSAGES/gtk40.mo
        //usr/local/bin/gtk4-encode-symbolic-svg
        //usr/local/lib/girepository-1.0/Gtk-4.0.typelib
        //usr/local/share/gtk-4.0/emoji/ru.gresource
        //usr/local/share/gtk-4.0/emoji/sv.gresource
        //usr/local/share/gtk-4.0/emoji/da.gresource
        //usr/local/lib/libwireshark.so.15.0.8
        //usr/local/share/locale/bg/LC_MESSAGES/gtk40.mo
Oops! Maybe we have a bug. Running a scrub right now...
 
The scrub didn't help.

This is a disposable system (just a VM with a copy my regular desktop rsync'd to it).

I can repair it by copying files from other identical FreeBSD systems -- providing they are not also corrupted.

I'll report back when I get home and check on the other FreeBSD systems...
 
My problem was cured by identifying the packages that owned the corrupt files with pkg provides, deleting them, doing a pool scrub, and re-installing the packages.

I guess I have to blame the underlying hardware of software (Windows 8.1) for the file system corruption on this VM.

Applause to ZFS, as this would have just been an unexplained mystery with just about any other file system.
 
My hardware is a bit flaky and I get occasional errors reported by zpool status, but I find that they are almost always false positives where the data can still be accessed on a terminal without an error message. In this case I just clear the error with:

zpool scrub zroot; sleep 200 ; zpool scrub -s zroot

It may need to be run more than once, but I've never needed to run a full scrub and I've never seen anything reported by the next full scrub.
 
Back
Top