Nutshell: After replacing a dead disk with a new spare, the devices renumbered causing confusion, first in the human and then in FreeBSD. I now have my pool mounted in a degraded state, wondering where /dev/da10.eli is, and unwilling to accept /dev/da9.eli as the same device.
Edit: Solution: I Used
Tried, Failed:
Next? I think I should
Details: I recently had to replace a failed disk in a RAID-Z3 (thread). I don't have hot-swap hardware so I silvered in a cold spare, and left the dead disk in place. I powered off the system before going on vacation. When I returned and booted it hung (apparently infinitely) while repeatedly throwing "Error 5" messages about the dead disk. So I powered off, removed the drive, added a new cold spare, and rebooted. My
As mentioned above, I think I need to do an import and let ZFS work out the new nomenclature itself; I'd appreciate confirmation of that! I'd expect I'd need to
In that same thread, wblock@ warns that data disks should not be labeled. Why is that? It would seem like a mechanism to avoid this problem. Is there a way to consistently reference devices in a pool such that the system's renumbering of the underlying devices doesn't cause this problem? Or is there a way to control how the system assigns device numbers, so it does not "recycle" and/or rearrange previously used device names?
Somewhat related: Is there any way to get FreeBSD to boot past a dead disk? The old dead device required manual intervention to "do" anything (the zpool is just data, no system files), so I was surprised it locked up the boot.
For completeness here are the dead-end alleys I stumbled through:
Edit: Solution: I Used
zfs unmount POOLNAME
, zpool export POOLNAME
, then zpool import POOLNAME
. This causes ZFS to re-inspect all devices in /dev/ and deduce which ones are part of the pool.Tried, Failed:
zpool replace
(it knows that da9 is supposed to be part of the pool already), zpool detach
(it knows it's not in the pool at the moment), zpool online
(generally confused)Next? I think I should
zpool import
, per a prior post by @phoenix?Details: I recently had to replace a failed disk in a RAID-Z3 (thread). I don't have hot-swap hardware so I silvered in a cold spare, and left the dead disk in place. I powered off the system before going on vacation. When I returned and booted it hung (apparently infinitely) while repeatedly throwing "Error 5" messages about the dead disk. So I powered off, removed the drive, added a new cold spare, and rebooted. My
geli
attach-and-mount script then choked when it came time to attach the new device, /dev/da10 ("geli: Cannot read metadata from /dev/da10: Invalid argument."). Much confusion followed (more details below) before I realized that removing a drive and adding its replacement had caused renumbering of the devices. What was now called "da10" used to be "da11", and was already in the pool. The new replacement device was now "da9". I was able to successfully attach it with geli. But the pool is doggedly looking for da10.eli, when what I have is da9.eli.As mentioned above, I think I need to do an import and let ZFS work out the new nomenclature itself; I'd appreciate confirmation of that! I'd expect I'd need to
zfs unmount
first?In that same thread, wblock@ warns that data disks should not be labeled. Why is that? It would seem like a mechanism to avoid this problem. Is there a way to consistently reference devices in a pool such that the system's renumbering of the underlying devices doesn't cause this problem? Or is there a way to control how the system assigns device numbers, so it does not "recycle" and/or rearrange previously used device names?
Somewhat related: Is there any way to get FreeBSD to boot past a dead disk? The old dead device required manual intervention to "do" anything (the zpool is just data, no system files), so I was surprised it locked up the boot.
For completeness here are the dead-end alleys I stumbled through:
Code:
citadel 332 [ROOT] zpool replace abyss /dev/da10.eli /dev/da9.eli
invalid vdev specification
use '-f' to override the following errors:
/dev/da9.eli is part of active pool 'abyss'
citadel 337 [ROOT] zpool detach abyss /dev/da9.eli
cannot detach /dev/da9.eli: no such device in pool
citadel 338 [ROOT] zpool detach abyss /dev/da10.eli
cannot detach /dev/da10.eli: only applicable to mirror and replacing vdevs
citadel 341 [ROOT] zpool online abyss /dev/da9.eli
cannot online /dev/da9.eli: no such device in pool
citadel 342 [ROOT] zpool online abyss /dev/da10.eli
warning: device '/dev/da10.eli' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present