Solved Is it possible to replace a disk in a mirror by booting from the outside.

Tartafione · Jan 21, 2021

My machine can't boot anymore. After showing some errors in zpool status ZFS put briefly the mirror in a degraded mode and after a while the system slowly stopped or better stalled. Then I could reboot and after some errors printed on the console screen finally resilvered. I decided to do a replacement as mentioned in the handbook :

20.3.5. Replacing a Functioning Device

Since the device is a brand new disk I used # zpool replace -f zroot da0p3 da3. It was not possible then to do the # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3...
But during the resilvering the system stalled. On the reboot the machine keep spitting messages like (ia0:mpt1:0:0:0): READ(10).CDB:20 00 1F... or (ia0:mpt1:0:0:0): SCSI sense: MEDIUM ERROR asc 11,0 (unrecovered read error) as it did on the earlier successful reboots.
As one disk is ok, do I have any hope to retrieve my data ?

Snurg · Jan 21, 2021

I just removed the defective drive, attached the new one, and detached the broken down one.
There is some info.

sko · Jan 21, 2021

What controller and disks / disk sizes are you using?
The controllers supported by the mpt driver are quite ancient - I wouldn't really trust those with any important data anymore. I've had several old FC-HBAs that started to act up with timeouts, i/o-errors and even complete system freezes under heavy load.
If you have an old SAS/SATA controller supported by the mpt driver there is also a good chance the controller and/or its firmware doesn't support disks over a specific size [1]. The behavior of those is best described as "undefined" - some just report the supported size, some stop to write, some lock up and some even wrap around and start writing at the beginning of the disk causing complete havoc.
We've had the latter behavior in an old server a while ago I had to debug and I reported my findings in a bug report:

220343 – The manual page for mpt(4) should inform about disk size limitations of most/all supported controllers

bugs.freebsd.org

[1]

Broadcom Inc. | Connecting Everything

www.broadcom.com

ShelLuser · Jan 21, 2021

Tartafione said:
Since the device is a brand new disk I used # zpool replace -f zroot da0p3 da3.

That command looks bizarre to me, surely you didn't replace one slice with an entire drive? Because that's what you're saying right here: replace da0p3 with the entirety of da3.

It might also be useful to always check the manualpage before doing stuff, zpool(8) states that:

Replaces old_device with new_device. This is equivalent to attaching
new_device, waiting for it to resilver, and then detaching
old_device.
...
new_device is required if the pool is not redundant. If new_device is
not specified, it defaults to old_device.

So... are you sure you needed a forceful replacement?

Tartafione said:
It was not possible then to do the # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3...

Probably because da3p1 seized to exist the very moment you added the whole drive to a pool.

I'd start by booting using a rescue system to check if you can still access the pool at all.

Tartafione · Jan 22, 2021

sko said:
What controller and disks / disk sizes are you using?
The controllers supported by the mpt driver are quite ancient - I wouldn't really trust those with any important data anymore. I've had several old FC-HBAs that started to act up with timeouts, i/o-errors and even complete system freezes under heavy load.
If you have an old SAS/SATA controller supported by the mpt driver there is also a good chance the controller and/or its firmware doesn't support disks over a specific size [1]. The behavior of those is best described as "undefined" - some just report the supported size, some stop to write, some lock up and some even wrap around and start writing at the beginning of the disk causing complete havoc.
We've had the latter behavior in an old server a while ago I had to debug and I reported my findings in a bug report:

220343 – The manual page for mpt(4) should inform about disk size limitations of most/all supported controllers

bugs.freebsd.org

[1]

Broadcom Inc. | Connecting Everything

www.broadcom.com

The disk size is 1To (2'5), the machine is a Dell PowerEdge R610 refurbished. Initially before creating the pool I neutralized the Raid as recommanded. The disk originally in place was a SG 250Go branded Dell.

sko · Jan 22, 2021

Depending on the exact chipset 1TB might still be too big (7000/8000 series, 9550) and might have caused the controller to wrap around.
Given you haven't nuked the data on the disk by adding the whole disk to the pool after writing GPT-headers or writing the headers/bootcode after starting the resilvering AND the error messages aren't coming from the second old disk which is now dying, I wouldn't take any risk with that old controller. The bus address 'mpt1:0:0:0' points to the controller, not the disk, so the remaining old disk should still be fine.

I'd highly recommend using a more recent controller to further examine the disks/pool and recover it (or at least some of the data on it).
SAS2008 based HBAs can be found for <30$ nowadays and the china clones are perfectly fine for home use. I haven't had any issues with those for many years and they are a perfect replacement for (OEM-)HBAs or Raid-controllers in older servers.

Tartafione · Jan 24, 2021

Snurg said:
I just removed the defective drive, attached the new one, and detached the broken down one.
There is some info.

I finally regain control on my pool, by removing physically the offending drive. Then after the reboot the system got back to normal, zpool showed the following status :

Code:

pool: zroot
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jan 23 08:07:11 2021
        278G scanned at 180M/s, 80,3G issued at 51,9M/s, 327G total
        80,3G resilvered, 24,60% done, 0 days 01:20:54 to go
config:

        NAME                        STATE     READ WRITE CKSUM
        zroot                       DEGRADED     0     0     0
          mirror-0                  DEGRADED     0     0     0
            da0p3                   ONLINE       0     0     0
            replacing-1             DEGRADED     0     0     0
              16126643399536167815  FAULTED      0     0     0  was /dev/da0p3
              da2                   ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /iocage/jails/rps_aura/root/var/db/pkg

That means that the attachment of da2 (formerly da3) # zpool replace -f zroot da0p3 da2 was taken into account and the process of resilvering was getting on. But on the whole disk which is bad.

Tartafione · Jan 24, 2021

ShelLuser said:
That command looks bizarre to me, surely you didn't replace one slice with an entire drive? Because that's what you're saying right here: replace da0p3 with the entirety of da3.

It might also be useful to always check the manualpage before doing stuff, zpool(8) states that:

So... are you sure you needed a forceful replacement?

You are right, I missed the stage of partitioning the disk. Afterward I looked at the da0 partition, gpart list da0, to mimic it into a fresh one da1. Once da1p[123] slices created, I could attach it to the pool. zpool attach zroot da0p3 da1p3 and then copy the bootcode into da1p1.

Code:

pool: zroot
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jan 23 17:26:39 2021
    327G scanned at 5,66M/s, 304G issued at 5,27M/s, 327G total
    609G resilvered, 93,21% done, no estimated completion time
config:

    NAME                        STATE     READ WRITE CKSUM
    zroot                       DEGRADED     0     0     0
      mirror-0                  DEGRADED     0     0     0
        da0p3                   ONLINE       0     0     0
        replacing-1             DEGRADED     0     0     0
          16126643399536167815  FAULTED      0     0     0  was /dev/da0p3
          da2                   ONLINE       0     0     0
        da1p3                   ONLINE       0     0     0

Overnight the resilvering finally completed I got :

Code:

pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: resilvered 653G in 0 days 17:01:05 with 0 errors on Sun Jan 24 10:27:44 2021
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        da0p3   ONLINE       0     0     0
        da2     ONLINE       0     0     0
        da1p3   ONLINE       0     0     0

errors: No known data errors

The trace of the defected drive disappeared, just as expected.
Then I just have to detach da2, and redo the partitioning, the attachment, copy the bootcode into the first partition and reattach it to the mirror, or reserve it to another use.

Tartafione · Jan 24, 2021

sko said:
Depending on the exact chipset 1TB might still be too big (7000/8000 series, 9550) and might have caused the controller to wrap around.
Given you haven't nuked the data on the disk by adding the whole disk to the pool after writing GPT-headers or writing the headers/bootcode after starting the resilvering AND the error messages aren't coming from the second old disk which is now dying, I wouldn't take any risk with that old controller. The bus address 'mpt1:0:0:0' points to the controller, not the disk, so the remaining old disk should still be fine.

I'd highly recommend using a more recent controller to further examine the disks/pool and recover it (or at least some of the data on it).
SAS2008 based HBAs can be found for <30$ nowadays and the china clones are perfectly fine for home use. I haven't had any issues with those for many years and they are a perfect replacement for (OEM-)HBAs or Raid-controllers in older servers.

pciconf -lv shows :
mpt0@pci0:4:0:0: class=0x010000 card=0x30a01000 chip=0x00581000 rev=0x08 hdr=0x00 vendor = 'Broadcom / LSI' device = 'SAS1068E PCI-Express Fusion-MPT SAS' ...
After googling around, SAS 1068E doesn't tell clearly any limitation in space. I don't plan to expand my pool any further, but can it go reasonably up to 2To ?

Tartafione · Jan 24, 2021

What I've learned :
If you have a ZFS-mirror
- Do not let the disk (or controler) error messages, stall your machine.
- Remove immediately the offending device.
- Replace with a spare disk.
- Don't forget to partition the replacement device accordingly.
- Give a little trust to this wonderful piece of software engineering, then type the commands.
- Enjoy the relief by sipping a soft drink or some mediterranean dill liquor while resilvering.

sko · Jan 25, 2021

Tartafione said:
- Don't forget to partition the replacement device accordingly.

or just use a separate disk/mirrored pool for the OS and put the data pool on whole disks, this simplifies disk replacement by several magnitudes and you can usually rely on zfsd to automagically do the right thing.

The SAS1068E is also based on the old SAS-1 architecture and unable to address drives >2TB. a quick search for "LSI SAS1068E 2TB" gives lots of results on that problem.
And even if this 2TB cap doesn't bother you: it is still a (very old!) RAID-Controller, so it is NOT recommended for use with ZFS.

Solved Is it possible to replace a disk in a mirror by booting from the outside.

20.3.5. Replacing a Functioning Device​

20.3.5. Replacing a Functioning Device