Solved Unable to run geli attach with disk internally

tOsYZYny · Apr 4, 2024

I moved an external drive internally as an internal drive appears to be failing. Normally, I run:

geli attach -p -k <file> da0

I expect I should be able to run:

geli attach -p -k <file> ada1

But, in this case, I'm getting:

geli: Cannot read metadata from ada1: Invalid argument.
geli: There was an error with at least one provider.

If I move it back to the external enclosure and repeat the process for da0, it works.

What am I missing here?

I'm working with the raw device, ZFS is setup on the encrypted volume.

I didn't setup a partition as I wanted the entire drive to be encrypted and within that, I could manage that through ZFS tools. It appears that if I try to run geli attach on an internal drive externally, it doesn't work, and if I do the converse, an external drive mounted internally, that also does not work. It only works in the original configuration.

Is there some setting or default that changes behavior such as blocksize for internal versus external drives?

T-Daemon · Apr 5, 2024

tOsYZYny said:
I didn't setup a partition as I wanted the entire drive to be encrypted and within that, I could manage that through ZFS tools. It appears that if I try to run geli attach on an internal drive externally, it doesn't work, and if I do the converse, an external drive mounted internally, that also does not work. It only works in the original configuration.

I suspect that the media size of the disks are not calculated correctly.

geli(8) writes the metadata of a initialized provider in the last sector or the disk. If the media sizes differ geli(8) wont be able to read that metadata, even if automatic expansion is turned on (default).

Compare media sizes of disks when attached internal and external: geom disk list ada1 (da0)

im · Apr 6, 2024

I had some external SATA->USB convertors who translated access to some sectors like MBR.
Probably you have very smart external hdd case who translated some data written to the physical HDD and then you will have some differences of raw data on different interfaces like USB and SATA.

tOsYZYny · Apr 6, 2024

I think you're right. I took an external HD and removed the base which provides the power and SATA connection and use that with internal drives. That enclosure came with a 3TB drive and my internal disks are 4 TB. I suspect that it was programmed for that

.

tOsYZYny · Apr 13, 2024

You're right, using 2 different adapters, I get 2 different results. I was originally looking at another USB drive.

This one works and is the original way I setup the drive.

Geom name: da1
Providers:
1. Name: da1
Mediasize: 4000787025920 (3.6T)
Sectorsize: 4096
Mode: r0w0e0
descr: Seagate FA GoFlex Desk
lunname: Seagate FA GoFlex Desk 2
lunid: Seagate FA GoFlex Desk 2
ident: 2HC015KJ
rotationrate: unknown
fwsectors: 63
fwheads: 255

This one does not work, new enclosure:

Geom name: da1
Providers:
1. Name: da1
Mediasize: 4000787030016 (3.6T)
Sectorsize: 4096
Mode: r0w0e0
descr: SABRENT
lunname: SABRENT DB9876543213F
lunid: 3042987654321430
ident: DB98765432143
rotationrate: unknown
fwsectors: 63
fwheads: 255

The media size is different, so I need to figure out a way around that.

tOsYZYny · Apr 13, 2024

What could I have done to prevent this issue? Should I have partitioned my disk before just putting geli on the entire thing? I figured a partition table wasn't needed since I wanted to use the whole disk for an encrypted ZFS pool.

Mirror176 · Apr 13, 2024

Having seen computers stop booting with a non-boot drive connected to it that had random data written all across it which required I connect it after boot to zero/partition it to avoid that, I am weary of trusting a BIOS/UEFI to be smart enough to handle corrupt/incompatible partition table structures and always partition disks given a choice. BIOS/UEFI code on many systems has been known to have plenty of bugs and be incompatible with the standards they should implement, and sometimes when clearly reported the manufacturers don't attempt to fix it with a response of "works with Windows".

If it was only a size difference of how many final bytes are on the end, that can be controlled with partitions and their sizes. Some controllers do something proprietary with a disk so it looks like a drive to the outside even though you do not have full/native access to its surface; connecting directly to another controller therefore can be like presenting a 'different' drive/data depending on what that controller was doing.

If you plan to change/move things, the best thing you can do is test the scenario to be aware of the issue that comes up. If you find controllers doing proprietary things, sharing it publicly helps others become aware and decide their use of it accordingly.

tOsYZYny · Apr 19, 2024

Ok, so how do I proceed? What is the best practice? Is a partition table required, would it have helped in this case?

If I don't format the drive, how can I specify the media size so that I can use the device as-is?

T-Daemon · Apr 19, 2024

tOsYZYny said:
What is the best practice? Is a partition table required, would it have helped in this case?

It would have helped, and it would have been best practice.

When creating a (GPT) partition for the geli(8) provider which expands over the whole disk, to make sure the partition size remains the same on different disk controller, specify a alignment value:

gpart(8)

Code:

     add       Add a new partition to the partitioning scheme given by geom.
               ...

               The add command accepts these options:

               -a alignment  If specified, then the gpart utility tries to
                             align start offset and partition size to be
                             multiple of alignment value.

For example, the guided Root-on-ZFS installation sets a value of 1m. From a /var/log/bsdinstall_log:

Code:

DEBUG: zfs_create_diskpart: gpart add -a 1m -l zfs0 -t freebsd-zfs "ada0"

tOsYZYny said:
If I don't format the drive, how can I specify the media size so that I can use the device as-is?

I don't think it's possible to modify the media size. The disk controller determines the size. What can be done is to move the geli(8) metadata at the end of the disk.

!!! Before all following operation I would make a backup of the disk data (if not done yet) and verify if it's good. !!!

Then a backup of the geli(8) metadata.

Next I would try in following order:

1 - geli(8) attach the external disk in the Seagate enclosure and check if the "AUTORESIZE" flag is set:

Code:

 $ geli list da1.eli | egrep 'Geom name|Flags'

If it is not set:

geli(8)

Code:

    configure  Change configuration of the given providers.

                Additional options include:
                ...
                -r
                   Turn on automatic expansion.  For more information, see the
                   description of the init subcommand.

Move over the disk to the SABRENT enclosure, see if it has an effect.

2 - If the "AUTORESIZE" flag is set, but it has no effect on the SABRENT enclosure, next try geli(8) 'resize' when disk in SABRENT enclosure:

Code:

 $ geli resize -s 4000787025920 da1

3 - If this doesn't work out, try geli(8) 'restore' the metadata when disk on the SABRENT enclosure. Probably you need to set the '-f' flag.

T-Daemon · Apr 19, 2024

In case the final disk insert place is as internal disk, perform steps 2 or 3 after plugged in in the host system.

tOsYZYny · Nov 22, 2024

I haven't tried this for fear of data loss. I have 3 copies in total, 1 live and 2 backups, not sure why I don't want to take that risk. I plan to drop my raw photos over winter and will try this at that time.

JordanG · Nov 22, 2024

The idea by T-Daemon is quite good and low-risk in my opinion as geli would simply refuse to do the resize operation if it can't read the old geli metadata block. I want to present two more ideas.

Let's imagine that Enclosure A sees the disk as having 980 sectors (because the enclosure stores 20 sectors of its own metadata on the disk) and Enclosure B sees the disk as having 1000 sectors.

For simplicity, let's assume that sectors with addresses from 0 to 979 in Enclosure A have the same addresses in Enclosure B. If there is an offset, it can be easily determined by computing a checksum for each sector when the disk is connected to Enclosure A and comparing the list of checksums with the one obtained when the disk is connected to Enclosure B.

When the disk is connected to Enclosure B if we relocate the data in sector 979 to sector 989, then 978 to 988, 977 to 987, and so on, until the data in sector 1 goes to sector 11 and the data in sector 0 goes to sector 10, then we will have 10 sectors at the beginning of the disk and 10 at the end for writing a GPT partition table. This would be a permanent solution for using the disk with Enclosure B only, but it may take a long time to move all the data. A special-purpose program will be needed to do the moving. I can write you one if you want.

A much faster but more crazy solution would be to use gconcat(8) to prepend and append two 1MB pieces to the geli disk. With those in place you can write a GPT partition table to the new gconcat device without overwriting any of the geli data. The 1MB pieces can be partitions on another disk or files in a filesystem turned into disks via mdconfig(8).

tOsYZYny · Nov 23, 2024

Thanks, that is a generous offer, but hold off for now. I am working through my media and will have a look at that hopefully this weekend.

tOsYZYny · Nov 26, 2024

I removed all of the device details, I'm not sure that adds much value.

I suppose in hindsight after re-reading the Storage Chapter in the FreeBSD handbook, there isn't any benefit to not using a partition table. I figured I could save some space and since I'm using the whole drive anyways, it is redundant in my opinion.

Could you explain the alignment part in more detail? If the controller is going to do what it wants, what good is a partition table? A partition table would say what is where, but if the controller is skipping over it, then it wouldn't be readable, right?

Also, since I'm mirroring the internal drive, wouldn't the size of each drive need to match? The internal drive is: 4000787025920 while 1 of the external drives is: 4000787021824 (slightly smaller).

I suppose it doesn't matter that much as I will never be able to utilize the full capacity of the disk in which case resilvering the external drive would fail? If I move the GELI metadata, wouldn't that mean the drive would work in one enclosure and not the other? I think it was mentioned in an earlier post that it would work only in one enclosure then. As opposed to doing those operations which I don't fully understand, it seems simpler and lower risk to just reinitialize the drive and resilver it with the source drive in the pool.

If I do that, should I create a partition table first and if I create the partition table, would that let me use either enclosure with the drive or even internally? It would be beneficial to me to be able to use either enclosure or perhaps another one altogether. Perhaps an enclosure fails for whatever reason.

If so, then what I can do is to repeat this process for each drive in the pool until they all have a partition table setup.

Random Aside:
I put a different drive in the enclosure, but it is setup the same way. I am able to use attach geli to both setups, but I can only import the pool with one enclosure. Why is that the case if geli works in both situations?

JordanG · Nov 26, 2024

If you want to use the same disk reliably with both Enclosure A and Enclosure B, you are entirely at the mercy of the firmware writers. While Enclosure B in your case seems to be "dumb" (non-smart), Enclosure A's firmware might be doing very non-trivial things like remapping sectors, computing checksums, encrypting data, or just doing stupid tricks for vendor lock-in. The moment you move a disk from Enclosure B to Enclosure A anything can happen unless assured by the manufacturer to the contrary.

Having a partition table helps geli find the start and end of its data even if the disk size changes somehow. In general, always use partitioning and always choose GPT. Always align all your partitions to 1MB. Wasting time to save a megabyte isn't worth it when your disk has more than 3 million megabytes.

What I think should work well for you is this: Have a GPT partition table on each disk. Create the partition table when connected to Enclosure A (the one which was eating some sectors). Create all partitions to be the same size. You will be unable to match the sizes of the existing ZFS mirror devices, so erase disk #1 and create a brand new pool. Transfer the data with zfs send/recv. Once you have the new pool with all your data, you can use ZFS's mirror functionality to copy the data to disk #2 and disk #3 which will also be reinitialized.

If you create all your GPT partition tables when the disks are connected to Enclosure A, they will be readable when connected to Enclosure B (or connected internally) and all partitions will stay the same size, even when the disks appear larger. The GPT partition table is written once at the start of the disk and once at the end. When connected to Enclosure A, it will really be at the end. When connected to Enclosure B (or internally) the second copy won't be at the end but it will be exactly at the sector specified for it in the copy at the start, so everything should work beautifully.

Just beware of the sysctl kern.geom.part.auto_resize (documented in gpart(8)). You don't want autoresize to happen as that can make a disk connected to Enclosure B stop working when connected to Enclosure A. The script /etc/rc.d/growfs can perform autoresize behind your back so make sure it is disabled.

tOsYZYny · Nov 27, 2024

Thanks, I'm documenting what I'm doing in case I run into this issue in the future, or someone else does. I disabled that sysctl just in case.

1. create GPT partition table

gpart create -s gpt da0
gpart add -a 1M -t freebsd da0

2. create GELI on partition

geli init -P -K <encryption key> -s 4096 -l 256 da0s1
geli attach -p -k <encryption key> da0s1

3. backup metadata

cp /var/backups/da0s1.eli <backup file>

3. create ZFS pool on GELI device

zpool create <ZFS pool name> da0s1.eli

4. create ZFS datasets on ZFS pool as normal

Thanks JordanG, it is working now in both enclosures. I thought partitions were necessary for older systems when you had different filesystems on the same disk, but I suppose they're still useful today. I still use it for my boot device, but FreeBSD sets that up automatically for me. In this case here, since I'm putting a ZFS pool on the device, I didn't think a partition was necessary.

In terms of making the disks match, is that a precise requirement? When I created the partition now, I just specified to use the entire disk. Nor, did I bother when setting up the disks initially.

So, right now, I am resilvering 1 external drive, once that is done, I will resilver the other external drive followed by the internal drive. If I don't do the internal drive because I will "never" use it externally, then I might lose the ability to use either enclosure with it. So, it would be wise to perform this operation on all drives.

JordanG · Nov 27, 2024

In the list of commands you supposedly executed, you mention da0s1 and da0s2. I would expect the single big partition on the disk to be named da0p1 but maybe your use of partition type 'freebsd' confuses things.

tOsYZYny said:
In terms of making the disks match, is that a precise requirement? When I created the partition now, I just specified to use the entire disk. Nor, did I bother when setting up the disks initially.

The chapter about ZFS in the FreeBSD Handbook says:

A mirror vdev will hold as much data as its smallest member.

If you try to create a ZFS mirror from disks of differing sizes, "zpool create" will refuse (unless you force it with -f).
If all your disks are of the same size and you create all partition tables when the disks are in enclosure A, and use the same alignment every time, the resulting partitions will all be of the same size without any extra effort.

By the way, the FreeBSD Handbook also contains this:

Caution
Using an entire disk as part of a bootable pool is strongly discouraged, as this may render the pool unbootable. Likewise, you should not use an entire disk as part of a mirror or RAID-Z vdev. Reliably determining the size of an unpartitioned disk at boot time is impossible and there’s no place to put in boot code.

tOsYZYny · Nov 28, 2024

JordanG, thanks, I fixed the typo.

That makes sense regarding the boot code. I think I am all sorted out now, thanks for all the help.