ZFS Fixing metadata errors after zfs clear && zfs scrub?

Mario Olofo · Sep 2, 2019

Helo all,

I was trying to make the ath10k driver work for my wifi model and after some attempts to compile and run the driver (and crash the system), I noticed that some directories was slow to access.
When I run "zpool status", it shows some files as corrupted. I removed the files manualy and took note from what packages they came from to be able to restore them.
Now the only errors left was some invalid metadata<0x123> that I can't fix running "zpool clear && zpool scrub".

My doubts are:

1- I see some places telling to run some zfs export and zfs import, but I think I can't do it on the same disk right?

2- My ssd is almost new (3, 4 months of use with dualboot Gentoo/Win10, now FreeBSD/Win10) and never had any issues with file errors.
ZFS says that I have checksum erros, may it be because of the invalid metadata?

3- The invalid metadata is a problem? I mean, it'll impact performance or reuse of blocks or anything harmfull in the long term?
I noticed that if I remove the invalid files and then try to recreate them, they show again in the status as invalid.
Per example, If I add a lot of files to ZFS with "npm install" in some of my projects, the chances are high that some file will be corrupted.

The output of smartctl is:

Code:

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue and Green SSDs
Device Model:     WDC WDS480G2G0B-00EPW0
Serial Number:    183541800480
LU WWN Device Id: 5 001b44 8b9628e44
Firmware Version: UK450000
User Capacity:    480.113.590.272 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      M.2
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Sep  2 12:34:29 2019 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  32) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x15) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  85) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3281
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       570
165 Block_Erase_Count       0x0032   100   100   000    Old_age   Always       -       1025
166 Minimum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       7
167 Max_Bad_Blocks_per_Die  0x0032   100   100   ---    Old_age   Always       -       0
168 Maximum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       15
169 Total_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       204
170 Grown_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Average_PE_Cycles_TLC   0x0032   100   100   000    Old_age   Always       -       7
174 Unexpected_Power_Loss   0x0032   100   100   000    Old_age   Always       -       112
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       2
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   063   053   000    Old_age   Always       -       37 (Min/Max 6/53)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
230 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0x025c0128025c
232 Available_Reservd_Space 0x0033   100   100   005    Pre-fail  Always       -       100
233 NAND_GB_Written_TLC     0x0032   100   100   ---    Old_age   Always       -       3227
234 NAND_GB_Written_SLC     0x0032   100   100   000    Old_age   Always       -       11545
241 Total_Host_GB_Written   0x0030   100   100   000    Old_age   Offline      -       4632
242 Total_Host_GB_Read      0x0030   100   100   000    Old_age   Offline      -       4956
244 Temp_Throttle_Status    0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Aborted by host               90%      3166         -
# 2  Short offline       Completed without error       00%      2522         -
# 3  Short offline       Interrupted (host reset)      90%      2070         -
# 4  Short offline       Interrupted (host reset)      90%      2040         -
# 5  Short offline       Interrupted (host reset)      90%      2011         -
# 6  Short offline       Interrupted (host reset)      90%      2011         -
# 7  Short offline       Aborted by host               80%      2011         -
# 8  Short offline       Completed without error       00%       624         -
# 9  Short offline       Aborted by host               30%       433         -
#10  Short offline       Aborted by host               90%       401         -
#11  Short offline       Completed without error       00%       400         -
#12  Short offline       Completed without error       00%       321         -
#13  Short offline       Completed without error       00%       106         -
#14  Short offline       Self-test routine in progress 20%       106         -
#15  Short offline       Aborted by host               90%        11         -

Selective Self-tests/Logging not supported

The output of "zpool status -v" is:

Code:

 mario@freebsd-g3  ~  sudo zpool status -v
  pool: zroot
state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:28 with 37 errors on Mon Sep  2 12:30:27 2019
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0   226
          ada1p6    ONLINE       0     0   482

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x6>
        <metadata>:<0x9>
        <metadata>:<0x10a>
        <metadata>:<0xb>
        <metadata>:<0x10c>
        <metadata>:<0x10>
        <metadata>:<0x16>
        <metadata>:<0x117>
        <metadata>:<0x12b>
        <metadata>:<0x46>
        <metadata>:<0x4b>
        <metadata>:<0xac>
        <metadata>:<0xb3>
        <metadata>:<0xde>
        <metadata>:<0xec>
        <metadata>:<0xee>
        zroot/ROOT/default:<0x0>
        //usr/local/share/PySide2/glue/qtqml.cpp
        zroot/ROOT/default:<0x3bbe>

Thank you

Alain De Vos · Sep 2, 2019

I'm not a zfs specialist. But once I run into a cksum and meta-data problem after a power outage.
The only way i could fix it was by removing the entire zpool.
My 5-cent, a bug in the filesystem ?

`Orum · Sep 3, 2019

Mario Olofo said:
1- I see some places telling to run some zfs export and zfs import, but I think I can't do it on the same disk right?

2- My ssd is almost new (3, 4 months of use with dualboot Gentoo/Win10, now FreeBSD/Win10) and never had any issues with file errors.
ZFS says that I have checksum erros, may it be because of the invalid metadata?

3- The invalid metadata is a problem? I mean, it'll impact performance or reuse of blocks or anything harmfull in the long term?
I noticed that if I remove the invalid files and then try to recreate them, they show again in the status as invalid.
Per example, If I add a lot of files to ZFS with "npm install" in some of my projects, the chances are high that some file will be corrupted.

I'm doubtful that this would fix the issue but it might be worth a shot. I'd boot from a USB to perform it; you can simply use the installer in the "LiveCD/shell" mode.
I believe a panic during write could cause the issue, but it's worth checking that your RAM is good at the very least. You can check the SSD as well, but fully testing it involves more writing to it which should be minimized when possible.
Are you sure it's not trying to reuse the corrupted files that are already there? It seems odd to me that it would again have corrupted metadata, but someone else surely knows more about this than I do.

Mario Olofo · Sep 4, 2019

Hello Alain De Vos and Orum, thank you for your reply.

Indeed I wasn't sure about the new files but what I think is that because they're added to the same path, it reusing the same node, because it's not always the same files become corrupted.
Maybe there's a "bug" in the ZFS, but it shows when the pool become corrupted and it can't index 100% of the files correctly because it can't trust the metadata anymore.

The notebook is a Dell G3, with 6 months of use now, but I did the RAM test anyway and didn't have any problems.
I think that I could mention that I installed the FreeBSD in an m.2 WD SSD, don't know if it have some problems with this model (saw something about Samsung's m2 SSD, not WD)
I used the FreeBSD Live CD to backup the zpool but it transfered 500MB to a file in an external HD and after that it reduced the speed to ~ 0.2MB/sec =/
It'll take forever, I think I'm better of creating a tar.gz for the data and recreate the pool...

The situation now after this couple of days from the initial post:

Code:

  pool: zroot
state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:32 with 41 errors on Tue Sep  3 20:59:33 2019
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0    88
          ada1p6    ONLINE       0     0   200

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x6>
        <metadata>:<0x9>
        <metadata>:<0x10a>
        <metadata>:<0xb>
        <metadata>:<0x10c>
        <metadata>:<0x10>
        <metadata>:<0x16>
        <metadata>:<0x46>
        <metadata>:<0x4b>
        <metadata>:<0xac>
        <metadata>:<0xb3>
        <metadata>:<0xc0>
        <metadata>:<0xde>
        <metadata>:<0xee>
        <metadata>:<0xf2>
        <metadata>:<0xf6>
        <metadata>:<0xfa>
        zroot/ROOT/default:<0x0>
        //usr/home/mario/Desenvolvimento/proj/src/app/mixins
        //usr/home/mario/Desenvolvimento/proj/.git/objects/00/a1476dfe4116ce527a3c2b82ba56d0dd7a3532
        //usr/home/mario/.npm/_cacache/index-v5/9b/8e
        //usr/home/mario/Desenvolvimento/proj/src/styles
        //usr/home/mario/.config/chromium/Default/IndexedDB/https_www.petz.com.br_0.indexeddb.leveldb
        zroot/ROOT/default:<0x3bbe>
        //usr/home/mario/.cache/chromium/Default/Code Cache/js/91bb50ca72148f99_0
        //usr/home/mario/Desenvolvimento/proj/node_modules/@angular-devkit/build-angular/node_modules/rxjs/src/util
        //usr/home/mario/Desenvolvimento/proj/.git/modules/builds/objects/dd/74145d30f7b7268dfc97b17fd437fc05b6a90f

`Orum · Sep 4, 2019

Honestly at this point I'd just restore from backup (you have a backup, right?

). You can continue trying to tar up what you have, but since it looks like we may never know what was corrupted, I'd be hesitant to restore anything that was a system file. You can hopefully restore most of your home directory at least.

As you're limited with the amount of hardware you can cram in a laptop, I wonder if setting copies=2 would help you avoid future problems? Of course it does halve the size and double all writes to the SSD, but it might be worth it. I'm not sure if metadata is duplicated though, so if that gets corrupted you might be SOL again. (Edit: You could also partition the SSD and mirror the two partitions within ZFS, which would definitely have two copies of the metadata.)

Oh, and lest we forget, there's always UFS2 + gjournal. I'd recommend avoiding SU+J as that seems to be a mess according to those that have tried it.

Mario Olofo · Sep 4, 2019

Hello Orum,

I don't have a full backup, but fortunatelly the system is only for work and all work and dotfiles are in gitlab.
Last night I found out that the ext2fs is the culpit for the extremelly slow write on my ext3 backup partition, so I reformated to UFS and now I can copy very fast.
But... there's one unknown file/folder that is causing a crash when I try to backup, I'll need to backup selectively from the root of the filesystem to see where's the problem to avoid...
I thought about the copies=2, but honestly I wanted ZFS just for the compression, is UFS2 capable of it? If it is, may you give me some advice on how I can reformat to use it + gjournal and mark it as bootable?

Thank you

Alain De Vos · Sep 4, 2019

`Orum said:
Oh, and lest we forget, there's always UFS2 + gjournal. I'd recommend avoiding SU+J as that seems to be a mess according to those that have tried it.

What do you mean by a mess , can you be concrete ? Who tried what with which result ?

My 5-cent, if you have 5 disks lying around, that's the case when ZFS is interesting.

`Orum · Sep 4, 2019

There's an entire thread on the subject. I'm mostly just trying to get across that ZFS is not the "end all, be all" of file systems, and UFS2 shouldn't be forgotten. I prefer it over ZFS on desktops and especially laptops, given their more limited resources. (UFS2 certainly has server applications too!)

And no, UFS2 does not offer transparent compression. To be honest, I've found transparent compression of limited use as most files are already compressed in a manner specific to their content, e.g. multimedia. With the notable exceptions of /var/log (which one can configure to use non-transparent compression on UFS) and HTML content directories, I basically never see anything above 2.5x and they're usually more like 1.3x. YMMV though, and if you have lots of text documents that you don't want to store in an archive, I can see the benefit.

Mario Olofo · Sep 4, 2019

I did a benchmark some time ago and the benefit of transparent compression in my case is that it accelerates the load of all the small files in the "node_modules" while transpiling typescript to js.
But yeah, the couple of seconds gained didn't pay off in the end of the day.
I read this thread last night, saw the problems with UFS but didn't know gjournal, maybe I'll be more sucessful using UFS2 + gjournal than ZFS.

Alain De Vos · Sep 4, 2019

I find SU+J performant. But performance means keeping stuff in memory. And keeping stuff in memory means losing it on power outage.
The opposite, when you always fsync to disk, you have a very reliable and very slow filesystem.

Mario Olofo · Sep 5, 2019

Well, just to give an update, I reformated with zfs and after I restored my backup, the zfs already corrupted the pool.
One more attempt with UFS2 but no luck either, I think that the FreeBSD have a bug with my SSD, will try again when I have time in my secondary hybrid HDD.
For now I gave up and switched back to Gentoo (I need to focus on work more than fixing the system).
Thank you guys for the help.

rigoletto@ · Sep 5, 2019

Mario Olofo said:
I don't have a full backup,

sysutils/zrepl