ZFS There's an error on my pool, what should I do?

DaLynX · Dec 20, 2020

Hello,

I am not familiar with ZFS, but am trying to learn.
I have the following error when I run zpool status -v:

Code:

  pool: zroot
state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 03:09:55 with 0 errors on Mon Dec  7 00:40:14 2020
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          ada0p3    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x189>:<0x27057>

As you can see I tried a scrub but it did not help. What should I do?

Argentum · Dec 21, 2020

DaLynX said:

Hello,

I am not familiar with ZFS, but am trying to learn.
I have the following error when I run zpool status -v:

Code:

  pool: zroot
state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 03:09:55 with 0 errors on Mon Dec  7 00:40:14 2020
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          ada0p3    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x189>:<0x27057>

As you can see I tried a scrub but it did not help. What should I do?

If it is still readable, try to send it to a new location using zfs send.
See my post:
https://forums.freebsd.org/threads/...ystem-to-a-different-zpool.78031/#post-486801
Another alternative in this case is to add a mirror device, let it resilver and remove the old device. Use zpool add mirror for that - see zpool(8).

DaLynX · Dec 21, 2020

Well everything seems to be working fine. And it's just a standalone server (private dedicated), so I don't have anything ready to duplicate it. Except renting another one I guess. Is there a way to repair in situ?

Argentum · Dec 21, 2020

DaLynX said:
Well everything seems to be working fine. And it's just a standalone server (private dedicated), so I don't have anything ready to duplicate it. Except renting another one I guess. Is there a way to repair in situ?

Have you tried zpool clear zroot?

But you have only one drive and this error shows that it did not behave. That means it may be in the EOL. Look if there are any S.M.A.R.T. errors. You can use sysutils/smartmontools for that.

ralphbsz · Dec 21, 2020

DaLynX said:

The fact that the error is reported in a hex number instead of a file name tells us that the problem is in some metadata. That could be directory content, it could be allocation bitmaps, it could be inode-like structures. Really hard to tell without ZFS expertise. You could be missing files (if a directory was damaged or destroyed), and then you would never notice (because you can't read the file that isn't there, and therefore not get a read error on it).

As you can see I tried a scrub but it did not help. What should I do?

You have only one disk, and probably copies=1 (most people do), so scrub can't do very much, since it can't recreate damaged data (no redundancy at the ZFS layer). As Argentum already said, the underlying problem might be errors on your disk drive, but it could also be anything else. You seem to imply that this is a rented server. Is this an Amazon/Azure/Google type cloud machine? Those typically have highly reliable fake disk drives (what you see as /dev/da or /dev/ada), but behind them tends to be RAID-like technology. To find out, run "camcontrol identify da0" (or inquiry, or ada0). If it says "Seagate" or "Hitachi", it is a physical disk; if it says "Microsoft" or "Google", it is a virtual disk. So that can pretty much exclude actual disk errors, and SMART won't report anything. On the other hand, if this machine is running on a single physical disk drive, you might consider whether the drive's age and reliability is up to your needs or not. Personally, I would never store any data that requires lots of work to restore on a single drive, but YMMV.

DaLynX · Dec 30, 2020

Argentum said:
Have you tried zpool clear zroot?

Yes, I tried, but to no avail.

ralphbsz said:
You seem to imply that this is a rented server. Is this an Amazon/Azure/Google type cloud machine?

It's actually a VM in a single proxmox server. No cloud or cluster storage.

Could I maybe make a zfs snapshot that I save as a file in a personal cloud storage space (like OneDrive) and restore it after reinstalling my VM from scratch? Would that be a proper way of doing it?

ralphbsz · Dec 30, 2020

Doing a complete backup and wiping the system is a good idea. Except that you can't be sure that the backup will catch all the files, if some have become un-findable. I think this is the best idea available, other than debugging what actually went wrong with either ZFS or the underlying storage.