ZFS Verifying data after mirror resilvering went wrong

Hi,
I had a bad luck. While resilvering a two way mirror, the other HDD threw 35 errors.
I have a backup in AWS Deep Archive, but retrieving the data is costly. There are currently 46 Current_Pending_Sectors on the bad drive, which is not much when 1 physical sector is 4KB right?

Instead of pulling 3TB of data from AWS, which would cost $250+ (AWS to internet is the costly part), I was thinking about using a checksum from the known-good files and comparing it to the possible corrupted files.

Is there a tool for this already? Otherwise I'll create it using some simple bash files, but it would be great if there was something already.
 
I have been through Current_Pending_Sector problems. I don't have an answer to your question, but this previous thread has some helpful observations.

Thanks for linking the thread.

I am not familiar with how filesystems work and I never worked with ZFS before, so I assumed once there are errors none of the data can be trusted but I read that ZFS stores a hash of the files so I was wondering since there are only two files that are affected with permanent error, can I rely on the rest of the files integrity?
I wanted to check all files against AWS but maybe I don't have to?
 
One more thing.
I read that current pending sectors can be due to the bad cable connection, but isn't it handled all on HDD? I thought SMART values are only handled by internals of the HDD, so a bad cable connection can be discarded as the cause?
 
A bad cable or bad cable connection to the device can make the device think it's getting errors. Think of a lamp flickering: is the the bulb, is it the power cord, is it the switch?
Internal temperatures can fluctuate, which can cause cables to maybe move a little.

A tip on mirrors: if you have the resources, you can attach a third device to make a 3 way mirror. I do that when I'm replacing devices because in theory when the third disk is done you have at least one good version, and you can walk through replacing the others.
 
A bad cable or bad cable connection to the device can make the device think it's getting errors. Think of a lamp flickering: is the the bulb, is it the power cord, is it the switch?
Internal temperatures can fluctuate, which can cause cables to maybe move a little.

A tip on mirrors: if you have the resources, you can attach a third device to make a 3 way mirror. I do that when I'm replacing devices because in theory when the third disk is done you have at least one good version, and you can walk through replacing the others.
Can you help me understand it?
What you are saying is that the HDD is waiting and relying on SATA driver/OS to tell it whether the HDD is reading correctly/can read the sector?
 
This is going to be a simplistic answer, real hardware is a lot more complicated and I poorly worded my first sentence.
It's more a case of "a bad cable can cause incorrect data to be presented to the HDD"

Data to the HDD is digital, yes? The value of 1 is represented by a voltage level let say > +3.0Vdc, a 0 by another voltage level, say < +1.0Vdc.
What value does the HDD record if the voltage is 2.0? Is that a 1 or a 0?
What if that bit is part of a command to the HDD, say part of the block number of where to write the data?
The HDD can be confused, maybe it locks up, maybe it writes data but to the wrong place. You never know something went wrong until you try to read the data back. You may or may not get bad data.

If a cable is bad (wire being intermittent at the connector, bad termination so you get distorted or corrupted data) the data received by the HDD may not be what was sent.
Honestly if one is seeing device errors, you are likely powering down and opening things up, so the first thing almost all of us do is reseat cables, swap cables and see what happens.
How many have insisted a network problem is not the cable but when you swap it for a brand new high quality one all problems go away?

ZFS has a bunch of checksums on a lot of the different data structures and the data itself; that makes it possible for the OS to detect errors earlier or "differently" than other file systems.
 
Thanks for taking time and explaining it.

I think I am going to cry.
I bought a new 8TB drive to replace the "faulty" one. Obviously, I had to restart to put it in but my second drive (which were 2 4TB assembled in RAID0 by mdadm.) disappeared and the old 4TB RAID1 appeared.

Turns out I used mdadm with drive (/dev/sde, /dev/sdd) instead of partitions and now the superblock is gone after restart.
I have two possibilities in my opinion.
1) Try to recover MDADM superblock with --asume-clean (I was switching sata ports so I am not sure what orded I used now lol)
2) Try to use the degraded drive

Could anyone please help me with the second option? How to recover degraded drive? I tried replugging the cable and hoping for the best.

Code:
root@galba:~# zpool status
  pool: data
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 2.97T in 09:51:00 with 35 errors on Tue Oct 22 18:59:35 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        data                                      DEGRADED     0     0     0
          mirror-0                                DEGRADED     0     0     0
            ata-WDC_WD80EFPX-68C4ZN0_WD-RD1ANYGD  DEGRADED     0     0     0  too many errors
            877198402037324315                    UNAVAIL      0     0     0  was /dev/disk/by-id/md-uuid-93c000f1:64b4b6c9:5dd48d17:338fd75c-part1

errors: 2 data errors, use '-v' for a list

What do you recommend?
 
I cleared errors with `zpool clear` and attached a the "new new" drive. Wish me luck
Code:
root@galba:~# zpool status
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct 23 21:17:05 2024
        228G scanned at 1.28G/s, 12.9G issued at 74.4M/s, 2.97T total
        12.9G resilvered, 0.43% done, 11:34:28 to go
config:

        NAME                                      STATE     READ WRITE CKSUM
        data                                      DEGRADED     0     0     0
          mirror-0                                DEGRADED     0     0     0
            ata-WDC_WD80EFPX-68C4ZN0_WD-RD1ANYGD  ONLINE       0     0     0
            877198402037324315                    UNAVAIL      0     0     0  was /dev/disk/by-id/md-uuid-93c000f1:64b4b6c9:5dd48d17:338fd75c-part1
            ata-WDC_WD80EFPX-68C4ZN0_WD-RD1ALJSD  ONLINE       0     0     0  (resilvering)

errors: 2 data errors, use '-v' for a list
 
Is there a tool for this already?
You can a zpool scrub and then check for files with errors with zpool status -v afterwards.
If you are lucky, all errors would be in files, but they could be in important ZFS internal metadata otherwise.
 
I am running scrub but i am getting cksum errors even on the "new new" drive an it went to "repairing".

Data were "successfully" resilvered before running scrub.

What is happening? Is the "new new" disk failing too?

Code:
Every 3.0s: zpool status                                                                                                                                 galba: Thu Oct 24 15:00:33 2024
  pool: data
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Thu Oct 24 09:06:45 2024
        2.14T scanned at 106M/s, 1.77T issued at 87.2M/s, 2.97T total
        32.2M repaired, 59.53% done, 04:00:33 to go
config:
        NAME                                      STATE     READ WRITE CKSUM
        data                                      DEGRADED     0     0     0
          mirror-0                                DEGRADED   246     0     0
            ata-WDC_WD80EFPX-68C4ZN0_WD-RD1ANYGD  DEGRADED   465     0   350  too many errors
            877198402037324315                    UNAVAIL      0     0     0  was /dev/disk/by-id/md-uuid-93c000f1:64b4b6c9:5dd48d17:338fd75c-part1
            ata-WDC_WD80EFPX-68C4ZN0_WD-RD1ALJSD  ONLINE       0     0   499  (repairing)
errors: 2 data errors, use '-v' for a list
 
I think I did it.
Code:
Every 3.0s: zpool status                                             galba: Thu Oct 24 21:35:36 2024

  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Thu Oct 24 21:34:12 2024
        223G scanned at 2.65G/s, 12.1G issued at 147M/s, 2.97T total
        0B repaired, 0.40% done, 05:51:18 to go
config:

        NAME                                             STATE     READ WRITE CKSUM
        data                                             ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            md-uuid-93c000f1:64b4b6c9:5dd48d17:338fd75c  ONLINE       0     0    35
            ata-WDC_WD80EFPX-68C4ZN0_WD-RD1ALJSD         ONLINE       0     0    40

errors: 1 data errors, use '-v' for a list

For anyone who has the same problem as me, I'll include some useful commands

My MDADM array superblock is gone after a restart. What to do?

This is a set of commands assuming that you created a RAID on devices, not partitions, like /dev/sdx
I had a RAID0, so I had to make sure I use the same order in which I created the array, but nothing happens if you don't get it on a first try.

This is how to recreate RAID0 array after the superblock has been lost.
Code:
 mdadm --create /dev/md0 --level 0 --raid-devices=2 /dev/sdf /dev/sde --assume-clean
Assume clean is very important because that doesn't override any existing data

After this, my old ZFS partitions came back!
Code:
root@galba:~# lsblk
NAME      MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS
sda         8:0    0   32G  0 disk
├─sda1      8:1    0   31G  0 part  /
├─sda2      8:2    0    1K  0 part
└─sda5      8:5    0  975M  0 part  [SWAP]
sdb         8:16   0   64G  0 disk
└─sdb1      8:17   0   64G  0 part
sdc         8:32   0  7.3T  0 disk
├─sdc1      8:33   0  7.3T  0 part
└─sdc9      8:41   0    8M  0 part
sdd         8:48   0  7.3T  0 disk
├─sdd1      8:49   0  7.3T  0 part
└─sdd9      8:57   0    8M  0 part
sde         8:64   0  3.6T  0 disk
└─md0       9:0    0  7.3T  0 raid0
  ├─md0p1 259:0    0  7.3T  0 part
  └─md0p9 259:1    0    8M  0 part
sdf         8:80   0  3.6T  0 disk
└─md0       9:0    0  7.3T  0 raid0
  ├─md0p1 259:0    0  7.3T  0 part
  └─md0p9 259:1    0    8M  0 part

I had to run the mdadm --create twice, because I didn't get the order of the disks right on the first try

The disk still UNAVAIL, so I had to bring back the old UUID
Running zpool import didn't do anything

This brings back the old UUID
Code:
 mdadm --assemble --update=uuid --uuid=93c000f1:64b4b6c9:5dd48d17:338fd75c /dev/md0

After that, the disk was REMOVED instead of UNAVAIL!

Now it's important to update mdadm.conf if you haven't (I didn't have to because UUID matched.) and update initramfs!
Code:
update-initramfs -u

After that I rebooted and voila! I successfully brought back the MD Array.
So I detached the DEGRADED drive and currently running zpool scrub. It's twice as fast. Maybe thanks to RAID0 but I think the old drive was slowing it down.

Code:
root@galba:~# zpool status
  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Thu Oct 24 21:34:12 2024
        452G scanned at 544M/s, 161G issued at 194M/s, 2.97T total
        0B repaired, 5.31% done, 04:13:00 to go
config:

        NAME                                             STATE     READ WRITE CKSUM
        data                                             ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            md-uuid-93c000f1:64b4b6c9:5dd48d17:338fd75c  ONLINE       0     0    35
            ata-WDC_WD80EFPX-68C4ZN0_WD-RD1ALJSD         ONLINE       0     0    40

errors: 1 data errors, use '-v' for a list
 
Back
Top