ZFS ZFS-GELI backup array offline- not sure how to proceed

Good evening,

I have just discovered that yesterday my array of 8 3TB spinning disks, all connected via USB, disconnected during a very brief power outage. It appears they are plugged into the surge-only portion of my UPS devices, so I expect that their disconnection was not a problem.

Code:
Jan  5 19:22:16 router kernel: ugen0.2: <vendor 0x050d product 0x0237> at usbus0 (disconnected)
Jan  5 19:22:16 router kernel: uhub3: at uhub2, port 3, addr 12 (disconnected)
Jan  5 19:22:16 router kernel: uhub3: detached
Jan  5 19:22:16 router kernel: ugen0.5: <JMicron USB to ATAATAPI Bridge> at usbus0 (disconnected)
Jan  5 19:22:16 router kernel: umass4: at uhub2, port 7, addr 10 (disconnected)
Jan  5 19:22:16 router kernel: da7 at umass-sim4 bus 4 scbus11 target 0 lun 0
Jan  5 19:22:16 router kernel: da7: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: da8 at umass-sim4 bus 4 scbus11 target 0 lun 1
Jan  5 19:22:16 router kernel: da8: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: da9 at umass-sim4 bus 4 scbus11 target 0 lun 2
Jan  5 19:22:16 router kernel: da9: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: da10 at umass-sim4 bus 4 scbus11 target 0 lun 3
Jan  5 19:22:16 router kernel: da10: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: (da7:umass-sim4:4:0:0): Periph destroyed
Jan  5 19:22:16 router kernel: (da8:umass-sim4:4:0:1): Periph destroyed
Jan  5 19:22:16 router kernel: (da10:umass-sim4:4:0:3): Periph destroyed
Jan  5 19:22:16 router kernel: (da9:umass-sim4:4:0:2): Periph destroyed
Jan  5 19:22:16 router kernel: umass4: detached
Jan  5 19:22:16 router kernel: ugen0.3: <JMicron USB to ATAATAPI Bridge> at usbus0 (disconnected)
Jan  5 19:22:16 router kernel: umass0: at uhub2, port 8, addr 9 (disconnected)
Jan  5 19:22:16 router kernel: da0 at umass-sim0 bus 0 scbus7 target 0 lun 0
Jan  5 19:22:16 router kernel: da0: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: da4 at umass-sim0 bus 0 scbus7 target 0 lun 1
Jan  5 19:22:16 router kernel: da4: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: da5 at umass-sim0 bus 0 scbus7 target 0 lun 2
Jan  5 19:22:16 router kernel: da5: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: da6 at umass-sim0 bus 0 scbus7 target 0 lun 3
Jan  5 19:22:16 router kernel: da6: <Hitachi HDS5C3030ALA630 0520> s/n [...] detached
Jan  5 19:22:16 router kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Jan  5 19:22:16 router kernel: (da4:umass-sim0:0:0:1): Periph destroyed
Jan  5 19:22:16 router kernel: (da5:umass-sim0:0:0:2): Periph destroyed
Jan  5 19:22:16 router kernel: (da6:umass-sim0:0:0:3): Periph destroyed
Jan  5 19:22:16 router kernel: umass0: detached
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=10632458215344621837
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=10632458215344621837
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=1763846306621303761
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=1763846306621303761
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=375602667091881610
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=375602667091881610
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=456579924252598725
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=456579924252598725
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=14978247955883167443
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=14978247955883167443
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=12620081082923192808
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=12620081082923192808
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=12405732299090880409
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=12405732299090880409
Jan  5 19:22:16 router ZFS: vdev state changed, pool_guid=7321291078325611977 vdev_guid=12648441390806736538
Jan  5 19:22:16 router ZFS: vdev is removed, pool_guid=7321291078325611977 vdev_guid=12648441390806736538
Jan  5 19:22:17 router kernel: re1: link state changed to DOWN
Jan  5 19:22:17 router kernel: ugen0.2: <vendor 0x050d product 0x0237> at usbus0
Jan  5 19:22:17 router kernel: uhub3 on uhub2
Jan  5 19:22:17 router kernel: uhub3: <vendor 0x050d product 0x0237, class 9/0, rev 2.00/0.00, addr 13> on usbus0
Jan  5 19:22:17 router kernel: uhub3: MTT enabled
Jan  5 19:22:18 router kernel: uhub3: 7 ports with 7 removable, self powered

[...]

Jan  5 19:22:36 router kernel: ugen0.3: <JMicron USB to ATAATAPI Bridge> at usbus0
Jan  5 19:22:36 router kernel: umass0 on uhub2
Jan  5 19:22:36 router kernel: umass0: <JMicron USB to ATAATAPI Bridge, class 0/0, rev 3.00/2.00, addr 14> on usbus0
Jan  5 19:22:36 router kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0000
Jan  5 19:22:36 router kernel: umass0:7:0: Attached to scbus7
Jan  5 19:22:36 router kernel: da0 at umass-sim0 bus 0 scbus7 target 0 lun 0
Jan  5 19:22:36 router kernel: da0: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:36 router kernel: da0: Serial Number [...]
Jan  5 19:22:36 router kernel: da0: 400.000MB/s transfers
Jan  5 19:22:36 router kernel: da0: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:36 router kernel: da0: quirks=0x2<NO_6_BYTE>
Jan  5 19:22:36 router kernel: da4 at umass-sim0 bus 0 scbus7 target 0 lun 1
Jan  5 19:22:36 router kernel: da4: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:36 router kernel: da4: Serial Number [...]
Jan  5 19:22:36 router kernel: da4: 400.000MB/s transfers
Jan  5 19:22:36 router kernel: da4: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:36 router kernel: da4: quirks=0x2<NO_6_BYTE>
Jan  5 19:22:36 router kernel: da5 at umass-sim0 bus 0 scbus7 target 0 lun 2
Jan  5 19:22:36 router kernel: da5: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:36 router kernel: da5: Serial Number [...]
Jan  5 19:22:36 router kernel: da5: 400.000MB/s transfers
Jan  5 19:22:36 router kernel: da5: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:36 router kernel: da5: quirks=0x2<NO_6_BYTE>
Jan  5 19:22:36 router kernel: da6 at umass-sim0 bus 0 scbus7 target 0 lun 3
Jan  5 19:22:36 router kernel: da6: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:36 router kernel: da6: Serial Number [...]
Jan  5 19:22:36 router kernel: da6: 400.000MB/s transfers
Jan  5 19:22:36 router kernel: da6: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:36 router kernel: da6: quirks=0x2<NO_6_BYTE>
Jan  5 19:22:38 router kernel: ugen0.5: <JMicron USB to ATAATAPI Bridge> at usbus0
Jan  5 19:22:38 router kernel: umass4 on uhub2
Jan  5 19:22:38 router kernel: umass4: <JMicron USB to ATAATAPI Bridge, class 0/0, rev 3.00/2.00, addr 15> on usbus0
Jan  5 19:22:38 router kernel: umass4:  SCSI over Bulk-Only; quirks = 0x0000
Jan  5 19:22:38 router kernel: umass4:11:4: Attached to scbus11
Jan  5 19:22:38 router kernel: da7 at umass-sim4 bus 4 scbus11 target 0 lun 0
Jan  5 19:22:38 router kernel: da7: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:38 router kernel: da7: Serial Number [...]
Jan  5 19:22:38 router kernel: da7: 400.000MB/s transfers
Jan  5 19:22:38 router kernel: da7: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:38 router kernel: da7: quirks=0x2<NO_6_BYTE>
Jan  5 19:22:38 router kernel: da8 at umass-sim4 bus 4 scbus11 target 0 lun 1
Jan  5 19:22:38 router kernel: da8: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:38 router kernel: da8: Serial Number [...]
Jan  5 19:22:38 router kernel: da8: 400.000MB/s transfers
Jan  5 19:22:38 router kernel: da8: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:38 router kernel: da8: quirks=0x2<NO_6_BYTE>
Jan  5 19:22:38 router kernel: da9 at umass-sim4 bus 4 scbus11 target 0 lun 2
Jan  5 19:22:38 router kernel: da9: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:38 router kernel: da9: Serial Number [...]
Jan  5 19:22:38 router kernel: da9: 400.000MB/s transfers
Jan  5 19:22:38 router kernel: da9: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:38 router kernel: da9: quirks=0x2<NO_6_BYTE>
Jan  5 19:22:38 router kernel: da10 at umass-sim4 bus 4 scbus11 target 0 lun 3
Jan  5 19:22:38 router kernel: da10: <Hitachi HDS5C3030ALA630 0520> Fixed Direct Access SPC-4 SCSI device
Jan  5 19:22:38 router kernel: da10: Serial Number [...]
Jan  5 19:22:38 router kernel: da10: 400.000MB/s transfers
Jan  5 19:22:38 router kernel: da10: 2861588MB (5860533168 512 byte sectors)
Jan  5 19:22:38 router kernel: da10: quirks=0x2<NO_6_BYTE>





I run GELI on the disks, which are in a raid Z3 array, so that normally, I attach each device, `gpt/backup0` through `gpt/backup7`, before the raid comes online. Currently, ZFS is completely unable to access them because of read errors yielded by GELI, which is apparently trying to perform authentication, though I could have sworn I initialized the disks without authentication when I encountered a common issue of authentication errors upon initialization; I do not need the authentication.
Code:
Jan  6 18:10:44 router kernel: GEOM_ELI: Device gpt/backup0.eli created.
Jan  6 18:10:44 router kernel: GEOM_ELI: Encryption: AES-XTS 256
Jan  6 18:10:44 router kernel: GEOM_ELI:  Integrity: HMAC/SHA256
Jan  6 18:10:44 router kernel: GEOM_ELI:     Crypto: software
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 2048 bytes of data at offset 1500296435712.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 1500296466944.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 1500296467456.
Jan  6 18:10:44 router last message repeated 2 times
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 1500296466944.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 512.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 0.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 1500296467456.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 65536.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 8192.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 0.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 262144.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 65536.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 8192.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 0.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 8192 bytes of data at offset 262144.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 32768.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 0.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 1024.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 8192.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 65536.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 0.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 1500296467456.
Jan  6 18:10:44 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 512 bytes of data at offset 1500296467456.
Jan  6 18:11:36 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 0.
Jan  6 18:11:36 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 262144.
Jan  6 18:11:36 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 1500295725056.
Jan  6 18:11:36 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 1500295987200.
Jan  6 18:11:52 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 0.
Jan  6 18:11:52 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 262144.
Jan  6 18:11:52 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 1500295725056.
Jan  6 18:11:52 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 1500295987200.
Jan  6 18:12:11 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 0.
Jan  6 18:12:12 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 262144.
Jan  6 18:12:12 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 1500295725056.
Jan  6 18:12:12 router kernel: GEOM_ELI: gpt/backup0.eli: Failed to authenticate 131072 bytes of data at offset 1500295987200.

Strangely, as well, GPT does not seem to recognize the disks, in some strange sense which I am not sure of. See below:
Code:
# gpart show
[...]
=>        40  5860533088  da0  GPT  (2.7T)
         40  5860533080    1  freebsd-zfs  (2.7T)
5860533120           8       - free -  (4.0K)

=>        40  5860533088  da4  GPT  (2.7T)
         40  5860533080    1  freebsd-zfs  (2.7T)
5860533120           8       - free -  (4.0K)

=>        40  5860533088  da5  GPT  (2.7T)
         40  5860533080    1  freebsd-zfs  (2.7T)
5860533120           8       - free -  (4.0K)

=>        40  5860533088  da6  GPT  (2.7T)
         40  5860533080    1  freebsd-zfs  (2.7T)
5860533120           8       - free -  (4.0K)

=>        40  5860533088  da7  GPT  (2.7T)
         40  5860533080    1  freebsd-zfs  (2.7T)
5860533120           8       - free -  (4.0K)

=>      2048  5860531087  da8  GPT  (2.7T)
       2048  5860531080    1  freebsd-zfs  (2.7T)
5860533128           7       - free -  (3.5K)

=>      2048  5860531087  da9  GPT  (2.7T)
       2048  5860531080    1  freebsd-zfs  (2.7T)
5860533128           7       - free -  (3.5K)

=>        40  5860533088  da10  GPT  (2.7T)
         40  5860533080     1  freebsd-zfs  (2.7T)
5860533120           8        - free -  (4.0K)


# gpart show gpt/backup0
gpart: No such geom: gpt/backup0

# gpart list da0
Geom name: da0
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 5860533127
first: 40
entries: 152
scheme: GPT
Providers:
1. Name: da0p1
  Mediasize: 3000592936960 (2.7T)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r1w1e2
  rawuuid: 251f7abd-b661-11e7-8a59-d43d7eb797a0
  rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
  label: backup0
  length: 3000592936960
  offset: 20480
  type: freebsd-zfs
  index: 1
  end: 5860533119
  start: 40
Consumers:
1. Name: da0
  Mediasize: 3000592982016 (2.7T)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r1w1e3


# ls /dev/gpt |grep backup
backup0
backup1
backup2
backup3
backup4
backup5
backup6
backup7

On the one hand, the label seems to be present as usual in `da0p1`, but on the other hand, I can't look at it with `gpart show`.

It's also worth noting that I cannot attach `backup7`:
Code:
# geli attach -pk /root/new-backup/backup7.key /dev/gpt/backup7
geli: Cannot read metadata from /dev/gpt/backup7: Invalid argument.


I configured this raid several months ago, and I haven't rebooted the machine since then. Here, I was scared at first that I might have used old encryption keys to attach the disks, but then I realized this is basically impossible as GELI would inform me as much. I had been scared of this because I was quite sure of having disabled authentication when I initialized GELI on the disks, though I seem to be wrong about this:
Code:
# geli dump /dev/gpt/backup0
Metadata on /dev/gpt/backup0:
    magic: GEOM::ELI
  version: 7
    flags: 0x10
    ealgo: AES-XTS
   keylen: 256
    aalgo: HMAC/SHA256
provsize: 3000592936960
sectorsize: 512
     keys: 0x01
iterations: -1
     Salt: [...]
MD5 hash: [...]

# geli dump /var/backup/
Metadata on gpt_backup4.eli:
    magic: GEOM::ELI
  version: 7
    flags: 0x10
    ealgo: AES-XTS
   keylen: 256
    aalgo: HMAC/SHA256
provsize: 3000592936960
sectorsize: 512
     keys: 0x01
iterations: -1
     Salt: [...]
MD5 hash: [...]
Strangely enough, only a few of the metadata backups are properly labelled as such. Several others are `da0p1.eli` etc, rather than `gpt_backup0.eli`, for example. I'm not sure if this means anything; I guess I never thought twice about it.


It's also worth noting that these are refurbished disks, all bought at the same time from the same retailer. This is concerning, but there isn't any evidence at all of any kind of 'physical layer' errors, by that I mean read errors that you see in `dmesg` or in `/var/log/messages`. And I cannot imagine that the brief power outage would have done anything, given that these are connected to a surge protector.


I am wondering now, if there is some way I can bypass authentication here, in order to access the data. I wonder also whether I can re-initialize GELI on these disks, with the same encryption algorithm, key length, and key, and expect GELI to yield the volume as expected. In other words, much in the same way one can technically destroy a GPT volume and recreate it with the same settings, leaving the data intact. Or, if the authentication is a hurdle that cannot be overcome in this way.



I'm not sure what the state of this system is at the moment and I would appreciate if anyone knows that they can say anything definitive about what is going on, and even better, on how I can get the array back online.

Thanks



edit

Guys,

Sorry, after giving up, I rebooted and realize now that I had decided not to use encryption on the backup array. I would've tried rebooting sooner but I had a ton of tmux windows open. So ZFS loaded the disks on boot. The GELI metadata on the disks seems to have stayed there from when I had tried out GELI before, which explains why they all had authentication. Sorry for the trouble. At least, I hope you had a good think while reading the post.
Thanks
 
Last edited:
Back
Top