Solved cannot import, "one or more devices is currently unavailable"... but all devices present and ONLINE

eovog861

New Member

Reaction score: 5
Messages: 5

I am an idiot.

I have no backup. Nothing critically important, but 6 TB of personal data I would sorely like to recover. I solemnly swear to create real backups first, before anything else, if I can get this working again.

I'm a newbie to FreeBSD. This new FreeBSD system is replacing an old Debian 6 server, which was running ZFSonLinux. I installed FreeBSD 11.0-RELEASE, set up the new storage pool (encrypted 10-disk RAID-Z2), transferred my data over (rsync, not zfs send/recv), ran for a few days happily, got rid of the old hardware containing the old pool.

The first time I rebooted the system, I did the geli attach commands but could not run zpool import -a. The new pool, "tank", was already present. However, I had to run zpool clear tank to make it available. I believe it was complaining about there not being a separate log device. I figured it was a side effect of using encrypted devices and didn't think much of it. Continued using the pool no problem.

Happened to install the lsof package today, which printed:
Code:
lsof: WARNING: compiled for FreeBSD release 11.0-RELEASE-p7; this is 11.0-RELEASE-p1.

Which is when I realized I needed to run updates. Duh. So I look it up in the handbook again, run freebsd-update fetch and freebsd-update install. Knowing I need to reboot for the kernel update, I figure I may as well try to shutdown more gracefully than last time. I ran zpool export tank and the geli detach commands, then issued a reboot. After logging in again, I run the geli attach commands, but... I can't reimport the pool.

Code:
# zpool import
   pool: tank
     id: 14383579345605766299
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

   tank         ONLINE
     raidz2-0   ONLINE
       da0.eli  ONLINE
       da1.eli  ONLINE
       da2.eli  ONLINE
       da3.eli  ONLINE
       da4.eli  ONLINE
       da5.eli  ONLINE
       da6.eli  ONLINE
       da7.eli  ONLINE
       da8.eli  ONLINE
       da9.eli  ONLINE
# zpool import tank
cannot import 'tank': one or more devices is currently unavailable
# zpool import -a
cannot import 'tank': one or more devices is currently unavailable
# zpool import -F -n tank
# zpool import -f tank
cannot import 'tank': one or more devices is currently unavailable
# zdb -e -C tank

MOS Configuration:
        version: 5000
        name: 'tank'
        state: 1
        txg: 44820
        pool_guid: 14383579345605766299
        hostid: 807724783
        hostname: 'filesrv'
        com.delphix:has_per_vdev_zaps
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 14383579345605766299
            create_txg: 4
            children[0]:
                type: 'raidz'
                id: 0
                guid: 4695976455601603174
                nparity: 2
                metaslab_array: 46
                metaslab_shift: 38
                ashift: 12
                asize: 35562502225920
                is_log: 0
                create_txg: 4
                com.delphix:vdev_zap_top: 35
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 11402930581434154143
                    path: '/dev/da0.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 36
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 18121392312499277188
                    path: '/dev/da1.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 37
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 6682933295533031840
                    path: '/dev/da2.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 38
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 7049728495400098022
                    path: '/dev/da3.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 39
                children[4]:
                    type: 'disk'
                    id: 4
                    guid: 4350589431371682403
                    path: '/dev/da4.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 40
                children[5]:
                    type: 'disk'
                    id: 5
                    guid: 8394194089471178648
                    path: '/dev/da5.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 41
                children[6]:
                    type: 'disk'
                    id: 6
                    guid: 13473629487508779270
                    path: '/dev/da6.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 42
                children[7]:
                    type: 'disk'
                    id: 7
                    guid: 8293128023524559929
                    path: '/dev/da7.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 43
                children[8]:
                    type: 'disk'
                    id: 8
                    guid: 4660497218889374037
                    path: '/dev/da8.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 44
                children[9]:
                    type: 'disk'
                    id: 9
                    guid: 916718491707390357
                    path: '/dev/da9.eli'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 45
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data
# zdb -l /dev/da0.eli
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 5000
    name: 'tank'
    state: 1
    txg: 44820
    pool_guid: 14383579345605766299
    hostid: 807724783
    hostname: 'filesrv'
    top_guid: 4695976455601603174
    guid: 11402930581434154143
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 4695976455601603174
        nparity: 2
        metaslab_array: 46
        metaslab_shift: 38
        ashift: 12
        asize: 35562502225920
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 11402930581434154143
            path: '/dev/da0.eli'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 18121392312499277188
            path: '/dev/da1.eli'
            whole_disk: 1
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 6682933295533031840
            path: '/dev/da2.eli'
            whole_disk: 1
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 7049728495400098022
            path: '/dev/da3.eli'
            whole_disk: 1
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 4350589431371682403
            path: '/dev/da4.eli'
            whole_disk: 1
            create_txg: 4
        children[5]:
            type: 'disk'
            id: 5
            guid: 8394194089471178648
            path: '/dev/da5.eli'
            whole_disk: 1
            create_txg: 4
        children[6]:
            type: 'disk'
            id: 6
            guid: 13473629487508779270
            path: '/dev/da6.eli'
            whole_disk: 1
            create_txg: 4
        children[7]:
            type: 'disk'
            id: 7
            guid: 8293128023524559929
            path: '/dev/da7.eli'
            whole_disk: 1
            create_txg: 4
        children[8]:
            type: 'disk'
            id: 8
            guid: 4660497218889374037
            path: '/dev/da8.eli'
            whole_disk: 1
            create_txg: 4
        children[9]:
            type: 'disk'
            id: 9
            guid: 916718491707390357
            path: '/dev/da9.eli'
            whole_disk: 1
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 5000
    name: 'tank'
    state: 1
    txg: 44820
    pool_guid: 14383579345605766299
    hostid: 807724783
    hostname: 'filesrv'
    top_guid: 4695976455601603174
    guid: 11402930581434154143
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 4695976455601603174
        nparity: 2
        metaslab_array: 46
        metaslab_shift: 38
        ashift: 12
        asize: 35562502225920
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 11402930581434154143
            path: '/dev/da0.eli'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 18121392312499277188
            path: '/dev/da1.eli'
            whole_disk: 1
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 6682933295533031840
            path: '/dev/da2.eli'
            whole_disk: 1
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 7049728495400098022
            path: '/dev/da3.eli'
            whole_disk: 1
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 4350589431371682403
            path: '/dev/da4.eli'
            whole_disk: 1
            create_txg: 4
        children[5]:
            type: 'disk'
            id: 5
            guid: 8394194089471178648
            path: '/dev/da5.eli'
            whole_disk: 1
            create_txg: 4
        children[6]:
            type: 'disk'
            id: 6
            guid: 13473629487508779270
            path: '/dev/da6.eli'
            whole_disk: 1
            create_txg: 4
        children[7]:
            type: 'disk'
            id: 7
            guid: 8293128023524559929
            path: '/dev/da7.eli'
            whole_disk: 1
            create_txg: 4
        children[8]:
            type: 'disk'
            id: 8
            guid: 4660497218889374037
            path: '/dev/da8.eli'
            whole_disk: 1
            create_txg: 4
        children[9]:
            type: 'disk'
            id: 9
            guid: 916718491707390357
            path: '/dev/da9.eli'
            whole_disk: 1
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
--------------------------------------------
LABEL 2
--------------------------------------------
failed to read label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to read label 3

The outputs for zdb -l /dev/da[1-9] are identical (checked with diff), except for the top-level guid lines in LABEL 0 and LABEL 1 (lines 12 and 109 respectively). I manually verified that the output for each disk corresponds to the correct child disk path/guid. Can post the full outputs if you like, but felt like it would be a bit overwhelming in a first post.

IT help desk instincts kicked in, so I tried a simple reboot. Nope, same thing. I even tried running freebsd-update rollback and rebooting again without success.

Any ideas? Any more information I can provide?
 

free-and-bsd

Daemon

Reaction score: 164
Messages: 1,000

Perhaps, it would be good if you described in detail how you created the new pool and transfered your existing files into it. Just in case.
 

usdmatt

Daemon

Reaction score: 601
Messages: 1,543

The first time I rebooted the system, I did the geli attach commands but could not run zpool import -a. The new pool, "tank", was already present. However, I had to run zpool clear tank to make it available. I believe it was complaining about there not being a separate log device.

This all seems a bit weird. I've never used GELI with ZFS so I'm not sure how ZFS handles importing a pool that is only available after the eli devices have been attached... I can't see why it should be mentioning a log device unless you had a log device in the pool at some point?

My first thought would be to try and import the pool from a live cd to see if a fresh FreeBSD system will import it. If that succeeds it may be stale data in /boot/zfs/zpool.cache that is confusing things.
 
OP
E

eovog861

New Member

Reaction score: 5
Messages: 5

Apologies for the delay... work, life, etc. There have been a few developments, though.

---
tl;dr: I used authenticated AES (HMAC/SHA256) on my GEOM ELI disks, but did not fully initialize the disks with dd if=/dev/urandom of=/dev/da0.eli bs=4k before creating the zpool. It looks like zpool import is attempting to read some empty (and thus uninitialized) blocks near the end of each disk, which is prematurely killing the import operation somehow and I am not expert enough to figure out how to work around this. Will be doing research later this week, but any advice or further questions would be much appreciated.
---

Perhaps, it would be good if you described in detail how you created the new pool and transfered your existing files into it. Just in case.

As a caveat, these are all typed from memory, so forgive any syntax issues. The zfs attributes weren't anything too crazy, so I don't think I'm missing any. I went through the relevant man pages again, just in case... nothing jumped out at me, at least. Also, I use the bourne shell (/bin/sh) for root; haven't learned the default C shell yet.

First was the encryption:

# stty -echo; read -r pass; stty echo; for i in 0 1 2 3 4 5 6 7 8 9; do echo "${pass}" | geli init -a HMAC/SHA256 -e AES-XTS -J - -l 256 -s 4096 /dev/da${i}; echo "${pass}" | geli attach -j - /dev/da${i}; done

Next, I used the whole ELI devices to create the pool and file systems:

# zpool create tank raidz2 da0.eli da1.eli da2.eli da3.eli da4.eli da5.eli da6.eli da7.eli da8.eli da9.eli
# zfs set atime=off tank
# zfs set checksum=sha512 tank
# zfs set compression=lz4 tank
# zfs set mountpoint=none tank
# zfs create tank/import
# zfs set mountpoint=/srv/import tank/import


Next, I set up my SSH key in ${HOME}/.ssh/authorized_keys for my non-root user and changed ownership on /srv/import to that user. Switching over to my old server (running Debian), I used rsync to copy my data over the network to the new server, into subdirectories on the new tank/import file system.

This all seems a bit weird. I've never used GELI with ZFS so I'm not sure how ZFS handles importing a pool that is only available after the eli devices have been attached... I can't see why it should be mentioning a log device unless you had a log device in the pool at some point?

My first thought would be to try and import the pool from a live cd to see if a fresh FreeBSD system will import it. If that succeeds it may be stale data in /boot/zfs/zpool.cache that is confusing things.

Definitely never had a log device. The pool was just a simple raidz2 with 10 data disks from the beginning. zpool status did provide an error code, but my browser history had already been cleared by the time of my first post, so I'm not sure what it was.

Thanks for the tip on the /boot/zfs/zpool.cache file. I had seen mention of it in zdb(8), but couldn't find it in /etc. The file only had the boot and root ZFS pools in it... which is another detail that may have been relevant. I just did the automatic ZFS option in the OS installer, encrypted root, single disk. Pointless, I know, but was planning on poking around later to see how it does things, then reinstall the OS on a mirror eventually.

After reading your post, I installed FreeBSD 11.0-RELEASE again on a spare SSD. Ran through the installer, rebooted to the new OS, ran updates, shut down, connected the data disks, booted up, geli attach, zpool import tank, same error.

---

This weekend I made some progress.

After my first post, I cleared my head for a couple of days and decided it was worth my time and money to make a real attempt at recovery. I ended up purchasing another 10 x 4 TB disks. Not cheap, but my thinking was that I could dd the decrypted contents and attempt recovery on a Linux system, where I'm more familiar, without nuking the originals.

In my frantic web searches a couple of weeks ago, I found a thread from a few years ago: Thread 27130. Probably not relevant, but helps explain my thought process. All of my 4TB disks, the data disks I'm trying to recover and the new disks I just bought for cloning, are the same model. However, the authenticated AES reduced the capacity on each of my data disks by about 10%. The raw devices are 4 TB, but after decryption, they've got about 3.6 TB usable. The plan was to dd from the decrypted ELI device to a partition on the clone disk.

For physical convenience, I opted to do this in another system, which has less ports. The OS disk is the second install of FreeBSD that I mentioned above. The pairs in the first batch were (going to be) ada0/ada4, ada1/ada5, ada2/ada6, and ada3/ada7.

# gpart create -s gpt ada4
# gpart add -s 69458012160 -t freebsd-zfs ada4
# dd if=/dev/ada0.eli of=/dev/ada4p1 bs=4k &
# # repeat for the other pairs


A few hours later, I came back to find that all four pairs had stopped early. Very early. All of them failed just a few minutes in, at around 6 GB. In addition to the usual dd stats, they also each printed out an error about the input file argument:

dd: /dev/ada0.eli: invalid argument

Huh. That's weird. It was consistent, too. Repeated the dd commands a few times and each disk consistently failed in the same place. Two of them at the same exact block, but all four were in the same neighborhood. Some head scratching and a few false leads later, I looked at the dmesg kernel log and realized what this was.

Time skip: all data disks are now back in their original server so that I can see where this is happening on each one. Some disks fail earlier, but all fail before 7 GB. Here's one of the earliest ones. Running at the console, you can see what is happening:

# dd if=/dev/da7.eli of=/dev/null bs=1m count=300
GEOM_ELI: /dev/da7.eli: Failed to authenticate 4096 bytes of data at offset 286322688
dd: /dev/da7.eli: invalid argument
273+0 records in
273+0 records out
286261248 bytes transferred in 12.586279 secs (22743913 bytes/sec)


Well, that's interesting. The GEOM_ELI error was harmless... or so I had thought. I saw this earlier, testing out creating an encrypted zpool with some older disks before the real ones arrived. On a new authenticated AES encrypted disk, you'll get this whenever you try to read a block that hasn't yet been initialized. The fix is simply to do something like the following before using the encrypted disk for the first time:

# dd if=/dev/urandom of=/dev/da0.eli bs=4k

...I skipped that step. Should have happened between geli attach and zpool create, but I was lazy (sigh) and figured I could just fill the file system later as a background task. Did I mention that I'm an idiot?

Running zpool import tank again from the console, I get enough of these GEOM_ELI errors to fill the screen. It looks like all of them are near the very end of each disk. My hypothesis is that, similar to dd being killed early, the zpool import is trying to read something that doesn't exist at the end of the disks and being terminated early. I assume the end of the disk is a common place for some sort of metadata or disk labels or something.

There doesn't seem to be anything immediately obviously wrong with the disks themselves. I haven't set up SMART or anything, will see about that later this week. But after getting the mysterious missing log device error mentioned above, I ran zpool status pretty frequently during and after copying my data over and no errors were reported. I can also do this without issue:

# dd if=/dev/da7 of=/dev/null bs=1m count=300
300+0 records in
300+0 records out
314572800 bytes transferred in 2.385884 secs (131847493 bytes/sec)


And yes, the authenticated AES is slow. Wow. Won't be using that going forward.

I (want to) believe the errors are happening during the dd because ZFS simply never happened to write to those blocks, meaning they should still be empty (zeroed out). Is there a way to disable the AES authentication now that it is already in place? Alternatively, is there a way to change the behavior of GEOM failures such that dd and zpool import are not negatively affected?

EDIT: Fixed "GELI_ELI" to "GEOM_ELI" when talking about the import.
EDIT2: Moved a paragraph downwards to make it easier to follow.
 

usdmatt

Daemon

Reaction score: 601
Messages: 1,543

Hmm, I can see why dd would fail. As soon as it reaches a block that hasn't been written to, the GELI authentication can't validate the block and returns EINVAL, which causes dd to give up (bit of a pain). A quick glance through the man page doesn't show any obvious way of ignoring authentication failures. It would be a bit worrying if this caused ZFS to fall over though as it shouldn't really be trying to read data it hasn't written.

Stupid question but have you tried importing using the id number. It shouldn't work as zpool import only shows one pool, but I just find it really strange that it's complaining about unavailable devices when the import output shows all devices online and the zdb output only defines one vdev.
 
OP
E

eovog861

New Member

Reaction score: 5
Messages: 5

Yes, I had previously tried importing the GUID already, but did again for sanity. Same issue. This is what it looks like with the dmesg output inserted (did a before/after comparison and pasted in the new GEOM_ELI lines):

# zpool import 14383579345605766299
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 1017070710784.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 1017070710784.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 1036472819712.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 1036472819712.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 1036472819712.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 1036472819712.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 1036472819712.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 1017070710784.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 1036472819712.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 1017070710784.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 1017070710784.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 1017070710784.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da6.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da7.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da8.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da9.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da1.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da2.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da3.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
GEOM_ELI: da4.eli: Failed to authenticate 4096 bytes of data at offset 3556255129600.
cannot import 'tank': one or more devices is currently unavailable


I think my next steps are:
1) Finish cloning the raw disk devices... just in case.
2) Attempt to satisfy the GEOM_ELI by zero-filling the blocks it complains about during the import. Seems like it's only a few per disk.
 
OP
E

eovog861

New Member

Reaction score: 5
Messages: 5

As an update, I had inconsistent results with simply zeroing the blocks the commands were complaining about. I noticed zdb -l /dev/da0.eli would throw the GEOM_ELI authentication errors just before failed to read label 2 and failed to read label 3. Zeroing those blocks successfully prevented the GEOM_ELI errors, but when I tried to do the same for the blocks that zpool import tank was trying to read, it seemed like I couldn't actually write those for some reason:

# dd if=/dev/da0.eli bs=4k skip=$((3556255125504 / 4096)) count=1 of=/dev/null
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
dd: /dev/da0.eli: Invalid argument
0+0 records in
0+0 records out
0 bytes transferred in 0.042902 secs (0 bytes/sec)
# dd if=/dev/zero bs=4k skip=$((3556255125504 / 4096)) count=1 of=/dev/da0.eli
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
1+0 records in
1+0 records out
4096 bytes transferred in 0.098251 secs (41689 bytes/sec)
# echo $?
0
# dd if=/dev/da0.eli bs=4k skip=$((3556255125504 / 4096)) count=1 of=/dev/null
GEOM_ELI: da0.eli: Failed to authenticate 4096 bytes of data at offset 3556255125504.
dd: /dev/da0.eli: Invalid argument
0+0 records in
0+0 records out
0 bytes transferred in 0.043755 secs (0 bytes/sec)


The writes actually take several minutes each with a CPU core pegged at 100%, btw, not the 0.1 seconds it claims.

I'm not sure why I can zero some blocks but not others. I did see sysctl kern.geom.debugflags=16 from geom(4), but that didn't help (didn't really expect it to, but figured I'd mention it).

I think I'm going to get my hands dirty this weekend and build a custom kernel with some sort of ugly hack to disable the errors. For future reference, this seems to be the file I'll need to edit: https://svnweb.freebsd.org/base/stable/11/sys/geom/eli/g_eli_integrity.c?view=co
 
OP
E

eovog861

New Member

Reaction score: 5
Messages: 5

SUCCESS!

I patched the file I mentioned above, recompiled/installed the kernel, rebooted and it works. I was able to successfully import the zpool, all file systems mounted, and I can read every file I've tried. I also noticed zdb -l /dev/da0.eli now gives output for labels 2 and 3, instead of being empty. That was even before the import commands, which seemed potentially interesting.

Thanks everyone!

I'm going to start backing things up this evening. When that's done, I'll make *another* backup, and only then will I rebuild this primary RAID to not use authenticated AES anymore. (Not because of my user error issue, but because the performance trade-off just isn't worth it, IMO.) I'll also keep an eye on zpool status, to be sure there are no actual data integrity issues.

Oh, and this is what I ended up doing, for future reference.

Code:
--- /usr/src/sys/geom/eli/g_eli_integrity.c.old
+++ /usr/src/sys/geom/eli/g_eli_integrity.c
@@ -185,35 +185,8 @@
      data_secsize = sc->sc_data_per_sector;
      if ((i % lsec) == 0)
        data_secsize = decr_secsize % data_secsize;
-     if (bcmp(srcdata, auth, sc->sc_alen) != 0) {
-       /*
-        * Curruption detected, remember the offset if
-        * this is the first corrupted sector and
-        * increase size.
-        */
-       if (bp->bio_error == 0)
-         bp->bio_error = -1;
-       if (coroff == -1) {
-         coroff = bp->bio_offset +
-             (dstdata - (u_char *)bp->bio_data);
-       }
-       corsize += data_secsize;
-     } else {
-       /*
-        * No curruption, good.
-        * Report previous corruption if there was one.
-        */
-       if (coroff != -1) {
-         G_ELI_DEBUG(0, "%s: Failed to authenticate %jd "
-             "bytes of data at offset %jd.",
-             sc->sc_name, (intmax_t)corsize,
-             (intmax_t)coroff);
-         coroff = -1;
-         corsize = 0;
-       }
-       bcopy(srcdata + sc->sc_alen, dstdata,
-           data_secsize);
-     }
+      bcopy(srcdata + sc->sc_alen, dstdata,
+          data_secsize);
      srcdata += encr_secsize;
      dstdata += data_secsize;
      auth += sc->sc_alen;
 

gkontos

Daemon

Reaction score: 487
Messages: 2,160

Good job, you might want to share this in the mailing lists as well.
 
Top