ZFS Resilver omits detail of which drives are being restored

rowan194 · Mar 10, 2020

I've had a new drive drop out a couple of times, but it tests fine in another machine. This time around, I swapped physical ports with another working drive, and after reboot this is shown:

Code:

  pool: db
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Mar 10 16:32:55 2020
        2.07T scanned at 955M/s, 1.92T issued at 289M/s, 4.84T total
        0 resilvered, 39.58% done, 0 days 02:56:42 to go
config:

        NAME                        STATE     READ WRITE CKSUM
        db                          ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            diskid/DISK-8HHHEEXHp1  ONLINE       0     0     0
            diskid/DISK-8HJ3A23Hp1  ONLINE       0     0     0
            diskid/DISK-8CJVUS0Ep1  ONLINE       0     0     9
        logs
          gpt/slog_db               ONLINE       0     0     0
        cache
          gpt/l2arc_db              ONLINE       0     0     0

errors: No known data errors

I'm surprised that the drive which is being resilvered is not identified. Is this normal behaviour? Could have sworn I've seen this detail included before.

The CKSUM column gives a hint (that is the drive which has previously dropped out), but given that I'm unsure of the exact fault (drive? controller? enclosure? cabling? power?), and the ports have been swapped, it's not really safe to make assumptions based on this information, beyond a very general "at least 1 disk out of the 3 has no known data errors".

To me it seems fairly important to identify which drive is not fully "in sync" with the rest of the mirror, particularly because a full resilver of this 12TB array can take a few days.

Thoughts?

sebhtml · Jun 13, 2020

Did it resilver successfully ?

ralphbsz · Jun 13, 2020

rowan194 said:
I'm surprised that the drive which is being resilvered is not identified. Is this normal behaviour? Could have sworn I've seen this detail included before.

I don't remember seeing it on my system, and my log files (which have snippets of output pasted into them) don't have that information.

I'm not even sure that one can in all cases uniquely identify which disk is being resilvered onto. In your case, that's easy: you have a mirror, so resilvering will bring every disk up to full content. In the case of the RAID-Z... layout, I think it is possible that resilvering writes to multiple disks, even if only one gets replaced. To be sure, I would need to go back to the theory of operations of ZFS to read up on what the data layout really is, it's complicated.

The CKSUM column gives a hint (that is the drive which has previously dropped out),

No, you have to be careful. This is a very weak hint, it only tells you which drive has had checksum errors. That's not necessarily the drive that is gone, or is being resilvered onto. In your case, the correlation may be perfect, but you can't necessarily generalize from that.

To me it seems fairly important to identify which drive is not fully "in sync" with the rest of the mirror, particularly because a full resilver of this 12TB array can take a few days.

Having helped implement a RAID system, I understand that users feel the need to know what is going on inside of their system. They want very detailed status information, and I've been part of a team implementing such status displays. Ultimately, this is a tradeoff. Giving users information is helpful, because it allows them to plan their actions. On the other hand, the only thing that really matters is: is the resilver done or not. Before it is done, you have to hold still and not do any other disk replacement. After it is done, you have your normal redundancy back.

There is also a grave risk of displaying too much status information: It may cause users to play with their system, and try to inject errors, to see whether the system behaves as expected. And end users injecting errors into storage system is incredibly insane, stupid, and dangerous. These kinds of quality assurance tests really need to be left to qualified developers, who have highly specialized test setups. I like to explain this with a joke: Cats do have nine lives. But you shouldn't take your household cat and kill it a few times just to see that it really works, because you might make a mistake and kill it too thoroughly. And the cat might still need a few extra lives, in case it runs into the neighbor's dog.

PMc · Jun 13, 2020

ralphbsz said:
I'm not even sure that one can in all cases uniquely identify which disk is being resilvered onto. In your case, that's easy: you have a mirror, so resilvering will bring every disk up to full content. In the case of the RAID-Z... layout, I think it is possible that resilvering writes to multiple disks, even if only one gets replaced.

I can confirm that it does this. I have observed multiple times that these writes were not at all what I might have expected.

So we should probably say that a resilver does resilver the pool to a consistent state, by writing to the disks what appears to be necessary.

The first important thing, if any errors show up in zfs, is to check the system logs for messages (over a longer time, e.g. since the last scrub). If you have a message in the system log, then forget about ZFS and debug from there.
The system log reports errors on the cam layer, which is below ZFS. If the cam layer produces errors, then ZFS cannot function properly and will likely also show errors. And what must be fixed is the cam layer, independent if there is a ZFS at all.

Then, ZFS chsum errors do appear at a later time, and there may not be a malfunction at all at that time.
Imagine the following:
I disconnect a disk while in operation. Then some data has been written to that disk, but did not arrive there and is not on the disk.
Then after putting everything together again and rebooting, the pool may look alright. But some later time, maybe at the next scrub, those respective blocks are read which didn't arrive on the disk. And NOW a cksum error will be recorded (and fixed)!
.
So if a cksum error appears today, this does not necessarily mean that there was a malfunction today.
Consequentially, to make any safe assumptions, you need to complete the resilver, and also do a scrub (because the resilver does only touch what has changed since the last safe TXG), then clear the errors, and then see how it behaves further.

ZFS Resilver omits detail of which drives are being restored

rowan194

sebhtml

ralphbsz

PMc