Solved ZFS pool missing after crash

Hi,
We had raidz1 pool on our FreeBSD 11. It suddenly crashed and after that it has status "unavaliable". I came with idea to clean that server inside from dust and so on. After that operation that pool missed at all! What is interesting BSD see all 3 disks but "zpool status" doesn't show that pool no more. Is it any option to recover it / fix it yet ?

thanks
 

SirDice

Administrator
Staff member
Administrator
Moderator
Can you post the output of the zpool status command?

As you've opened the case and rummaged around in it make sure all connectors are still in place and things still get power. It's quite easy to knock one of the SATA connectors loose for example.
 
Code:
[root@backup1 ~]# zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada3p4  ONLINE       0     0     0
            ada4p4  ONLINE       0     0     0

errors: No known data errors
[root@backup1 ~]#
so there isn't pool "backuppc" - missed

Regarding your second notice - other 3 disks from pool "backuppc" exist in system:
Code:
[root@backup1 ~]# dmesg | grep WDC
ada0: <WDC WD2003FZEX-00SRLA0 01.01A01> ACS-3 ATA SATA 3.x device
ada1: <WDC WD2003FZEX-00Z4SA0 01.01A01> ACS-2 ATA SATA 3.x device
ada2: <WDC WD2003FZEX-00Z4SA0 01.01A01> ACS-2 ATA SATA 3.x device
ada0: <WDC WD2003FZEX-00SRLA0 01.01A01> ACS-3 ATA SATA 3.x device
ada1: <WDC WD2003FZEX-00Z4SA0 01.01A01> ACS-2 ATA SATA 3.x device
ada2: <WDC WD2003FZEX-00Z4SA0 01.01A01> ACS-2 ATA SATA 3.x device
[root@backup1 ~]#
...but ZFS doesn't see them
 

SirDice

Administrator
Staff member
Administrator
Moderator
Even if all the disks disappeared (for whatever reason), I would still expect to see the pool show up with zpool status. It would be in a DEGRADED state (or even a FAILED state) but it should still show up. Does zpool import show the "missing" pool?
 
If "zpool import" fails to give interesting information, then look at the partition tables (using gpart) of the three disks you suspect of being part of the missing pool. If they look like they really are the correct disks (hopefully they have ZFS partitions), then you can try using "zdb -l /dev/XXX" to examine the ZFS volume data structures on them, but those details are outside my expertise.
 
What can I say...
1. 3 disks from missed pool are visible in dmesg
2. zfs doesn't see them, "zpool status" see only 1 pool not 2
3. 'zpool import" gives nothing
4. zdb -l /dev/ada0 (one of the disks from missed pool), gives:
Code:
[root@backup1 ~]# zdb -l /dev/ada1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
[root@backup1 ~]#

I'm very dissapointed of ZFS. How is it possible that it lost pool ??
 

SirDice

Administrator
Staff member
Administrator
Moderator
ZFS is quite resilient to failures and errors but it's not bulletproof.
 
Something is wrong here. The fact that neither "zfs import" nor "zdb -l" find anything on the disk means that either something overwrote the disk, or we are looking at the wrong disk.

Question: Were the disks partitioned and ZFS used one of the partitions like /dev/adaXpY, or was ZFS given the whole disk /dev/adaX? What is in the partition tables? You can check with gpart.

What is on the disks? Try this: "hexdump -C /dev/adaX | more" and look at it for a while. Do you see ZFS headers, GPT partition tables, actual data, or something else?

If something really overwrote the whole disk, there is nothing ZFS can do. But that would be very strange.
 
Something is wrong here. The fact that neither "zfs import" nor "zdb -l" find anything on the disk means that either something overwrote the disk, or we are looking at the wrong disk.

Question: Were the disks partitioned and ZFS used one of the partitions like /dev/adaXpY, or was ZFS given the whole disk /dev/adaX? .
Don't know, I took over that server after old admin

What is in the partition tables? You can check with gpart..
[root@backup1 ~]# gpart show ada0
gpart: No such geom: ada0.
[root@backup1 ~]# gpart recover ada0
gpart: arg0 'ada0': Invalid argument

What is on the disks? Try this: "hexdump -C /dev/adaX | more" and look at it for a while. Do you see ZFS headers, GPT partition tables, actual data, or something else?
There isn't any ZFS or GPT strings

So in summary it looks like something overwrite/delete partition tables on all 3 disks. So I see that this isn't ZFS fault. The only thing I did was I cleaned physically that server inside by compressed air and I unplugged disks and plugged back - thats all. Before that status of missed pool was "unavailable" but now after cleaning there isn't that pool and even partition tables - weird.

Main question now: is it any way to recover that partition tables ?
 

SirDice

Administrator
Staff member
Administrator
Moderator
As far as I know there are no recovery tools that support ZFS.
 
Long shot but they weren't encrypted were they? Losing a pool from 3 disks and seeing no trace of it at all is very strange.
 
This doesn't seem like a FreeBSD or ZFS related problem to me, more like some kind of bizarre hardware issue.

What does gpart list tell you? Does it list the HD's at all? Perhaps try sysctl kern.disks as well.

If the disks don't show up in there then I'd double check the hardware connectors again.
 

SirDice

Administrator
Staff member
Administrator
Moderator
wrkilu , you looked at the contents of the disk with hexdump(1), does the data look random? Or does it look "empty" (all zeros)? And did you check every disk or just one?
 
This doesn't seem like a FreeBSD or ZFS related problem to me, more like some kind of bizarre hardware issue.

What does gpart list tell you? Does it list the HD's at all? Perhaps try sysctl kern.disks as well.

If the disks don't show up in there then I'd double check the hardware connectors again.

[root@backup1 ~]# sysctl kern.disks
kern.disks: ada4 ada3 ada2 ada1 ada0

# gpart list
returns info only about ada3 and ada4 (good pool)
 
wrkilu , you looked at the contents of the disk with hexdump(1), does the data look random? Or does it look "empty" (all zeros)? And did you check every disk or just one?
Code:
[root@backup1 ~]# hexdump -C /dev/ada2 | head -20
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  77 e6 04 16 77 c5 60 63  56 e9 e9 25 89 0f 19 cb  |w...w.`cV..%....|
00002010  e3 26 13 7c 3b c0 43 d3  bf dd d7 11 15 61 d5 76  |.&.|;.C......a.v|
00002020  bf e3 53 e9 7d 2b bf fc  dd 4f d1 f6 9e 96 ad 7b  |..S.}+...O.....{|
00002030  79 4c 10 cd ec d9 46 c2  74 ea 92 e5 82 18 ad 8c  |yL....F.t.......|
00002040  97 5e bc 96 bb fe a7 fa  80 07 3d 5c c1 3e 61 36  |.^........=\.>a6|
00002050  29 38 3c d7 83 5d a0 ec  00 11 00 7a 52 60 5c 14  |)8<..].....zR`\.|
00002060  ca f4 42 91 99 6a dd f1  9f 94 2a 00 8b 87 b3 0d  |..B..j....*.....|
00002070  89 20 8a 39 47 76 3b dd  59 94 35 b1 8e 2d b0 47  |. .9Gv;.Y.5..-.G|
00002080  c8 14 1f 02 ad 85 61 a7  4b de 05 06 e8 32 f3 b3  |......a.K....2..|
00002090  4f a6 17 4d 41 bc 87 3d  85 0a d3 01 6e 38 21 17  |O..MA..=....n8!.|
000020a0  40 b8 f8 91 89 e3 ab ed  cd b0 29 0f 63 a4 dc 2e  |@.........).c...|
000020b0  9e d8 a3 df d9 e0 09 34  f6 8e 2c 77 04 74 bd bc  |.......4..,w.t..|
000020c0  b9 37 b7 44 3c 06 85 cf  e6 da 39 fb a4 b3 f5 9f  |.7.D<.....9.....|
000020d0  a8 05 90 85 1e 69 a2 bb  bc cd 6b 02 5c e1 3c 15  |.....i....k.\.<.|
000020e0  e7 2e 31 4d 9b c0 9c 21  36 9b 49 a8 1f 0c bb 36  |..1M...!6.I....6|
000020f0  5f bd 82 12 88 f3 01 82  ad 74 18 0c 51 74 f0 4d  |_........t..Qt.M|
00002100  17 b7 e1 e5 58 e3 46 9e  94 48 33 11 b4 81 5b d7  |....X.F..H3...[.|
00002110  9d 0f 64 5b 2a 09 52 96  65 4d 3b 7a 2f 97 3a cd  |..d[*.R.eM;z/.:.|
the same for others broken disks
 

SirDice

Administrator
Staff member
Administrator
Moderator
That looks fairly random at first glance, usdmatt hinted at this, is there any chance these disks were encrypted?
 
[root@backup1 ~]# sysctl kern.disks
kern.disks: ada4 ada3 ada2 ada1 ada0

# gpart list
returns info only about ada3 and ada4 (good pool)
That would mean that other than ada3 and ada4 all the others have been wiped clean. At the very least in such a way that their bootsector / partition table is no longer recognized by the system as valid. Which would mean that any data on there is most certainly likely gone unfortunately.

(edit): There is of course another possibility: that the drives never got partitioned but instead have been added to a pool in a raw state.
 
Status unavailable indicates that the disks might be encrypted. Had you ever reboot the server before the crash?
 
Jesus!!!!!!!! they are encrypted! I contacted with old Admin, he didn't write any documentation about that and thats why I couldn't know fuck. They are encrypted with geli tool. After mounting them and importing with "zpool import backuppc" all is working!

Fuck, sorry for your wasted time.
Regards
 

SirDice

Administrator
Staff member
Administrator
Moderator
Now that everything is working again, your first priority is to document it! Don't make the same mistakes the old admin made ;)
 
Right I just did it. And I added that mounting to autostart - tested and works. What a story... heh.

And I bring back honor for ZFS :)
 
Top