ZFS Cannot either boot from or import zpool

Hello

One of my servers hanged, got reset and cannot boot since that time. To check zpool, I've booted from mfsbsd and cannot understand a thing.

1. zpool import shows healthy RAIDZ of 4 disks, all disks online
2. zpool import -R /mnt ioffe, after some thinking, yields "cannot import 'ioffe': one or more devices are currently unavailable"
3. zdb -e ioffe coredumps when reading dataset ioffe/root/var
4. in Configuration for import section, zdb shows disk 0 "removed: 1", while the disk is not removed
5. zpool import -F -R /mnt ioffe yields "cannot import 'ioffe': I/O error. Destroy and re-create pool from a backup source"
6. smartctl does not show any errors, and attempt to dd if=/dev/ada0 of=/dev/null bs=100k count=100k (for all disks) does not show any problem

What happened? Can I do anything besides destroying pool? How do I prevent such a problem?...
 
- Do you mean 1 and 2 are produced by the mfsbsd image, i.e. in the end the pool cannot really be imported ?
- Could you actually post the "zpool status" from the mfsbsd if you can import the pool and "zpool import" from the instance that cannot import it ?
- If I understood you correctly and the mfsbsd cannot import the pool - in case you are using the 12.x rescue image could you try and use 13.x image, since A LOT of rescue options were added while reupstreaming the zfs to the ZoL upstream ?
 
- Do you mean 1 and 2 are produced by the mfsbsd image, i.e. in the end the pool cannot really be imported ?
- Could you actually post the "zpool status" from the mfsbsd if you can import the pool and "zpool import" from the instance that cannot import it ?
- If I understood you correctly and the mfsbsd cannot import the pool - in case you are using the 12.x rescue image could you try and use 13.x image, since A LOT of rescue options were added while reupstreaming the zfs to the ZoL upstream ?
Yes, all diagnostics are from mfsbsd 12.1; pool was created by FreeBSD 12.1, so compatibility cannot be an issue.
I'll post zpool import as soon as I'll configure the network for ssh access (really soon); but no error, just a healthy pool ready for import.
 
Also check the controller itself. And perhaps power. The controller might have a problem accessing more than one drive causing I/O errors. I had this happen with a cheap Promise SATA card. As long as I accessed one disk at a time it worked, but trying to access all four drives was too much for it. Controller chip on the card got really, really hot too. So that card was just fried. Same idea with power. Power consumption might be fine if you access one drive at a time but may struggle to keep all drives active at the same time.
 
Also check the controller itself. And perhaps power. The controller might have a problem accessing more than one drive causing I/O errors. I had this happen with a cheap Promise SATA card. As long as I accessed one disk at a time it worked, but trying to access all four drives was too much for it. Controller chip on the card got really, really hot too. So that card was just fried. Same idea with power. Power consumption might be fine if you access one drive at a time but may struggle to keep all drives active at the same time.
It worked for a several months without a flaw. Three more boxes of the same case, power, HDDs works ok. SATA are all onboard.
 
Server is new. Anyway test reading showed no error. And I still don't understand why zpool import shows healthy pool while zpool import ioffe fails
 
Okay, I've set up network interface and finally got the normal looking console.

setting vfs.zfs.debug=1 yielded lot of noise but nothing reasonable

zpool import -R /mnt ioffe

cannot import 'ioffe': one or more devices is currently unavailable

writing to log


(repeated several times)

zpool import -o readonly -F -R /mnt ioffe

produces the same sequence to log, but claims that

Code:
cannot import 'ioffe': I/O error
        Destroy and re-create the pool from
        a backup source.

zpool import claims that

Code:
root@mfsbsd:~ # zpool import
   pool: ioffe
     id: 14461477687519964930
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        ioffe           ONLINE
          raidz1-0      ONLINE
            gpt/ioffe0  ONLINE
            gpt/ioffe1  ONLINE
            gpt/ioffe2  ONLINE
            gpt/ioffe3  ONLINE

smartctl shows that disks are new and without any error. What's the most strange that gptzfsboot lists files on filesystem!

Everything looks correct, bot nothing works.
 
Code:
root@mfsbsd:~ # zdb -c -e ioffe

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading concrete vdev 0, metaslab 5 of 116 ...Assertion failed: space_map_load(msp->ms_sm, msp->ms_allocatable, maptype) == 0 (0x5 == 0x0), file /usr/src/cddl/contrib/opensolaris/cmd/zdb/zdb.c, line 3349.
Abort (core dumped)

Looks like it's cold dead... but gptzfsboot reads something?
 
zdb -k -e ioffe finally shows errors!

Code:
ZFS_DBGMSG(zdb):
spa_import: importing ioffe
spa_load(ioffe, config trusted): LOADING
disk vdev '/dev/gpt/ioffe1': best uberblock found for spa ioffe. txg 894659
spa_load(ioffe, config untrusted): using uberblock with txg=894659
spa_load(ioffe, config trusted): LOADED
spa=ioffe async request task=32
spa_import: importing ioffe_CHECKPOINTED_UNIVERSE
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config trusted): LOADING
disk vdev '/dev/gpt/ioffe1': best uberblock found for spa ioffe_CHECKPOINTED_UNIVERSE. txg 894659
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config untrusted): using uberblock with txg=894659
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config trusted): FAILED: unable to retrieve checkpointed uberblock from the MOS config [error=2]
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config trusted): UNLOADING
 
zpool import -V imported the pool, but it has cksum fails, 2 on pool, 12 on raidz1-0. zpool clear failed, zfs does not see any datasets.
Missing in action, presumed dead.
 
Back
Top