ZFS Fix broken ZFS pool (zroot | mirrored) primary GPT tables corrupt/invalid

Hey Guys!

I'm currently trying to fix/recover a broken zroot pool.

My Setup:
  • Drives: 2x Samsung 850EVO 250GB SSD's (da0 and da1)
  • Encryption: No
  • RootOnZFS: YES
How I got into the problem: I accidentally removed the wrong drive from the server and I wiped the drive on another PC. When I realized that it was the wrong drive, I tried to resilver it.

zpool status zroot showed that the 1st SSD (da0) was ONLINE and without errors and the 2nd SSD (da1) DEGRADED.

gpart show da1 showed GPT header corrupt (or something like that) and da0 did not had a corrupt GPT header at this time.

I rebooted the server and then I got the ZFS: i/o error - all block copies unavailable error message... (with da0 and da1 and only da0)

I also tried to import the pool on macOS with openZFS sudo zpool import -f zroot -o readonly=on

Code:
cannot import 'zroot': no such device in pool
Destroy and re-create the pool from a backup source.

and sudo zpool import -F

Code:
   pool: zroot
     id: 7697653246985412200
  state: UNAVAIL
 status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
 config:

    zroot                                         UNAVAIL  insufficient replicas
      media-8C776A20-7224-11E8-98B2-E8393529C350  ONLINE
      da1p3                                       UNAVAIL  cannot open


I booted the server from a 11.2-RELEASE USB Stick (Live) and then I tried to import the pool zpool import -f zpool.

Now comes the part I don't fully understand.

IMG_0204.jpg


It shows that the primary GPT table is corrupt or invalid, but when I check with gpart show da0 it doesn't show me that the there is an corrupt GPT table...

Thanks in advance :)
 
Judging from the looks of it you set up your pool as a stripe. In other words: both disks made up for the entire pool, so there was nothing to resilver because you never set up a mirror.

I can't comment on the GPT errors because you never shared the output of gpart, but even so I don't think you're going to revive from this one other than grabbing your backups.
 
The GPT error is very common when using a pre-formatted disks. Search on the about that to know more.

If the installation is new, or you have a backup you can easily use, or you have extra disks, the easiest and safest way to solve it would be destroying the whole thing, including the GPT partitions, and later create the pool.

I never found out how to do that (destroy the GPT partitions) using gpart(8), so I use sysutils/gdisk. In its expert functions (or something like that) there is an options to destroy the GPT partitions.
 
The drives were mirrored! I tried to add the drive I removed, then the pool didn't showed up as a mirrored one.
That's not what your screenshot above shows us.

This is a mirror:
Code:
peter@zefiris:/home/peter $ zpool status zroot
  pool: zroot
state: ONLINE
  scan: scrub repaired 0 in 2h0m with 0 errors on Tue Jan 16 06:06:25 2018
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0

errors: No known data errors
Notice how it explicitly mentions mirror-0? That does not "just" disappear when one of the disks becomes faulty or even goes offline. Not even if you'd use # zpool detach to remove one of the disks. You'd see the same thing with the comment that the pool has become DEGRADED.

The only way I could see this theoretically happen is if you'd have run zpool remove but even then you'd be left with a healthy pool (so I assume) running on one disk. Which is something I suppose you could try: try and remove the faulty disk, though I also somewhat doubt that this is going to work because as far as I know you can't remove data vdevs just like that:

Code:
     zpool remove [-np] pool device ...

         Removes the specified device from the pool.  This command currently
         only supports removing hot spares, cache, log devices and mirrored
         top-level vdevs (mirror of leaf devices); but not raidz.

(edit): How did you try to re-add the disk anyway? Or even remove it for that matter? Maybe retracing those steps could help, but solely basing myself on the output you shared I'm not very optimistic here.
 
Code:
root@server:~ # zpool status zroot
  pool: zroot
state: ONLINE
  scan: scrub repaired 0 in 0h7m with 0 errors on Mon Jul 16 15:43:07 2018
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
             da0p3  ONLINE       0     0     0
             da1p3  ONLINE       0     0     0

errors: No known data errors

Yeah I know how a mirror looks like. ^^ This is how it looked like before it happend....

I removed da1p3 without thinking about it with zpool remove zroot da1p3:oops:
 
Last edited:
Back
Top