ZFS Cannot either boot from or import zpool

tarkhil

Member


Messages: 20

Hello

One of my servers hanged, got reset and cannot boot since that time. To check zpool, I've booted from mfsbsd and cannot understand a thing.

1. zpool import shows healthy RAIDZ of 4 disks, all disks online
2. zpool import -R /mnt ioffe, after some thinking, yields "cannot import 'ioffe': one or more devices are currently unavailable"
3. zdb -e ioffe coredumps when reading dataset ioffe/root/var
4. in Configuration for import section, zdb shows disk 0 "removed: 1", while the disk is not removed
5. zpool import -F -R /mnt ioffe yields "cannot import 'ioffe': I/O error. Destroy and re-create pool from a backup source"
6. smartctl does not show any errors, and attempt to dd if=/dev/ada0 of=/dev/null bs=100k count=100k (for all disks) does not show any problem

What happened? Can I do anything besides destroying pool? How do I prevent such a problem?...
 

drookie

New Member


Messages: 4

- Do you mean 1 and 2 are produced by the mfsbsd image, i.e. in the end the pool cannot really be imported ?
- Could you actually post the "zpool status" from the mfsbsd if you can import the pool and "zpool import" from the instance that cannot import it ?
- If I understood you correctly and the mfsbsd cannot import the pool - in case you are using the 12.x rescue image could you try and use 13.x image, since A LOT of rescue options were added while reupstreaming the zfs to the ZoL upstream ?
 
OP
tarkhil

tarkhil

Member


Messages: 20

- Do you mean 1 and 2 are produced by the mfsbsd image, i.e. in the end the pool cannot really be imported ?
- Could you actually post the "zpool status" from the mfsbsd if you can import the pool and "zpool import" from the instance that cannot import it ?
- If I understood you correctly and the mfsbsd cannot import the pool - in case you are using the 12.x rescue image could you try and use 13.x image, since A LOT of rescue options were added while reupstreaming the zfs to the ZoL upstream ?
Yes, all diagnostics are from mfsbsd 12.1; pool was created by FreeBSD 12.1, so compatibility cannot be an issue.
I'll post zpool import as soon as I'll configure the network for ssh access (really soon); but no error, just a healthy pool ready for import.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 9,512
Messages: 34,309

Also check the controller itself. And perhaps power. The controller might have a problem accessing more than one drive causing I/O errors. I had this happen with a cheap Promise SATA card. As long as I accessed one disk at a time it worked, but trying to access all four drives was too much for it. Controller chip on the card got really, really hot too. So that card was just fried. Same idea with power. Power consumption might be fine if you access one drive at a time but may struggle to keep all drives active at the same time.
 
OP
tarkhil

tarkhil

Member


Messages: 20

Also check the controller itself. And perhaps power. The controller might have a problem accessing more than one drive causing I/O errors. I had this happen with a cheap Promise SATA card. As long as I accessed one disk at a time it worked, but trying to access all four drives was too much for it. Controller chip on the card got really, really hot too. So that card was just fried. Same idea with power. Power consumption might be fine if you access one drive at a time but may struggle to keep all drives active at the same time.
It worked for a several months without a flaw. Three more boxes of the same case, power, HDDs works ok. SATA are all onboard.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 9,512
Messages: 34,309

It worked for a several months without a flaw.
Things can break over time. That card I talked about worked fine for years. Then it suddenly broke.
 
OP
tarkhil

tarkhil

Member


Messages: 20

Server is new. Anyway test reading showed no error. And I still don't understand why zpool import shows healthy pool while zpool import ioffe fails
 
OP
tarkhil

tarkhil

Member


Messages: 20

Okay, I've set up network interface and finally got the normal looking console.

setting vfs.zfs.debug=1 yielded lot of noise but nothing reasonable

zpool import -R /mnt ioffe

cannot import 'ioffe': one or more devices is currently unavailable

writing to log


(repeated several times)

zpool import -o readonly -F -R /mnt ioffe

produces the same sequence to log, but claims that

Code:
cannot import 'ioffe': I/O error
        Destroy and re-create the pool from
        a backup source.


zpool import claims that

Code:
root@mfsbsd:~ # zpool import
   pool: ioffe
     id: 14461477687519964930
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        ioffe           ONLINE
          raidz1-0      ONLINE
            gpt/ioffe0  ONLINE
            gpt/ioffe1  ONLINE
            gpt/ioffe2  ONLINE
            gpt/ioffe3  ONLINE


smartctl shows that disks are new and without any error. What's the most strange that gptzfsboot lists files on filesystem!

Everything looks correct, bot nothing works.
 
OP
tarkhil

tarkhil

Member


Messages: 20

Code:
root@mfsbsd:~ # zdb -c -e ioffe

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading concrete vdev 0, metaslab 5 of 116 ...Assertion failed: space_map_load(msp->ms_sm, msp->ms_allocatable, maptype) == 0 (0x5 == 0x0), file /usr/src/cddl/contrib/opensolaris/cmd/zdb/zdb.c, line 3349.
Abort (core dumped)


Looks like it's cold dead... but gptzfsboot reads something?
 
OP
tarkhil

tarkhil

Member


Messages: 20

zdb -k -e ioffe finally shows errors!

Code:
ZFS_DBGMSG(zdb):
spa_import: importing ioffe
spa_load(ioffe, config trusted): LOADING
disk vdev '/dev/gpt/ioffe1': best uberblock found for spa ioffe. txg 894659
spa_load(ioffe, config untrusted): using uberblock with txg=894659
spa_load(ioffe, config trusted): LOADED
spa=ioffe async request task=32
spa_import: importing ioffe_CHECKPOINTED_UNIVERSE
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config trusted): LOADING
disk vdev '/dev/gpt/ioffe1': best uberblock found for spa ioffe_CHECKPOINTED_UNIVERSE. txg 894659
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config untrusted): using uberblock with txg=894659
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config trusted): FAILED: unable to retrieve checkpointed uberblock from the MOS config [error=2]
spa_load(ioffe_CHECKPOINTED_UNIVERSE, config trusted): UNLOADING

 
OP
tarkhil

tarkhil

Member


Messages: 20

zpool import -V imported the pool, but it has cksum fails, 2 on pool, 12 on raidz1-0. zpool clear failed, zfs does not see any datasets.
Missing in action, presumed dead.
 
Top