Solved [Solved] ZFS pool erroneously claims device is unavailable

I have a RAID-Z pool with four 1 TB drives. Each drive is encrypted with geli and the zpool is created on top of the .eli devices (ada{0,1,2,3}.eli). A few months ago one of the drives (ada3) crashed and I replaced the drive with a new one (encrypted it with geli and replaced the faulty disk in the pool) and resilvered the array. zpool status reported the pool as working properly and the scrubbing completed with no errors.

Now another disk in the pool crashed (ada1) and I have replaced it with a new one. However, after unlocking the drives in the pool, zpool claims that ada3.eli is unavailable (it finds ada0.eli and ada2.eli just fine). I am left to wonder what to do now. The problem reported in the following discussion could be related:

http://forums.freenas.org/index.php?threads/zfs-pool-resilvered-upon-reboot-forgets-new-drive.14396/

I am unable to remember if I have rebooted the computer since replacing the first faulty drive. Some data which could be relevant: the ZFS pool was originally created on FreeBSD 7 using the devices ad{0,4,8,12}.eli (but the naming scheme changed during some OS upgrade). I am currently running FreeBSD 9.1. If I run zdb it identifies the third disk (the one that crashed a few months ago) as:

Code:
    children[3]:
        type: 'disk'
        id: 3
        guid: 13212171415151566930
        path: '/dev/ad12.eli'
        whole_disk: 0
        DTL: 21

whereas zdb -l /dev/ada{0,2,3}.eli (the disks that should still be working) each identify the third disk as:
Code:
    children[3]:
        type: 'disk'
        id: 3
        guid: 8683322241746783619
        path: '/dev/ada3.eli'
        phys_path: '/dev/ada3.eli'
        whole_disk: 1
        DTL: 7

And, indeed, zpool status reports:
Code:
    13212171415151566930  UNAVAIL      0     0     0  was /dev/ad12.eli

Curiously, zdb also reports "version: 15" whereas zdb -l <device> all report "version: 28" (I am quite sure that I upgraded ZFS to version 28).

How can I fix this discrepancy? I have not yet tried running zpool replace <pool> 13212171415151566930 /dev/ada3.eli since I am afraid of data loss now that another disk has actually crashed. Should it be safe to issue this command? Or is there some other course of action that can be taken? Could I reimport the zpool from the disks somehow? Considering that they seem to contain the proper information about the pool.
 
Re: ZFS pool claims device is unavailable even though the de

Hmm, what does the full zpool status look like at the moment?
 
Re: ZFS pool claims device is unavailable even though the de

Code:
 pool: storage
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        storage                   UNAVAIL      0     0     0
          raidz1-0                UNAVAIL      0     0     0
            ada0.eli              ONLINE       0     0     0
            6903798275375421766   UNAVAIL      0     0     0  was /dev/ad6.eli
            ada2.eli              ONLINE       0     0     0
            13212171415151566930  UNAVAIL      0     0     0  was /dev/ad12.eli

where the latter was previously replaced with "8683322241746783619". The former is the one that just crashed that I want to replace if I can get the pool up and running again.

But note that zdb -l /dev/ada{0,1,3}.eli reports the pool members correctly, unlike zdb and zpool status.
 
Re: ZFS pool erroneously claims device is unavailable

I solved it by detaching the disks from geli so that everything in the pool became UNAVAIL. I then ran zpool export storage, reattached all the disks with geli and imported the pool again, which by then identified the disks correctly.
 
Back
Top