Solved Replacing a failed drive in an encrypted ZFS RAIDZ array with both boot and root pools

There is a lot of advice on replacing ZFS volumes but the details are just slightly at variance from my config in a way that has given me pause. I note the admonishments that mirror and RAIDZ processes are different and before I blow this system away and have to start over by doing something stupid, I wanted to check with the collective wisdom as to best practices.

For example, it appears that these instructions are for mirrors are are not fully appropriate to RAIDZ arrays, as are these from Oracle.

The advice from 19.3.6. Dealing with Failed Devices seems good for the zroot array, but I'm worried might complicate or fail the rebuild of the bootpool array.

The physical drive was replaced already - it is on a RAID controller (hardware RAID) but is presented as JBOD. I've reformatted the disk at BIOS without any errors, initialized, and then created a single disk "array" that is presented to the OS (this is how the rest are configured).

I'm not completely clear on whether a command like 19.3.6's

zpool replace mypool 13374215198732904044 aacd5p4.eli would work (but... zpool replace mypool 1337421519873290404 boot5)

is sufficient or if I need to manually format the disk the use zpool attach and install the boot blocks.

Code:
# zpool status
  pool: bootpool
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 368K in 0h0m with 0 errors on Fri Mar  3 20:50:35 2017
config:

    NAME                      STATE     READ WRITE CKSUM
    bootpool                  DEGRADED     0     0     0
      mirror-0                DEGRADED     0     0     0
        gpt/boot0             ONLINE       0     0     0
        gpt/boot1             ONLINE       0     0     0
        gpt/boot2             ONLINE       0     0     0
        gpt/boot3             ONLINE       0     0     0
        gpt/boot4             ONLINE       0     0     0
        13374215198732904044  UNAVAIL      0     0     0  was /dev/gpt/boot5
        gpt/boot6             ONLINE       0     0     0
        gpt/boot7             ONLINE       0     0     0

errors: No known data errors

  pool: zroot
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 784K in 0h0m with 0 errors on Mon Mar  5 12:56:11 2018
config:

    NAME                     STATE     READ WRITE CKSUM
    zroot                    DEGRADED     0     0     0
      raidz2-0               DEGRADED     0     0     0
        aacd0p4.eli          ONLINE       0     0     0
        aacd1p4.eli          ONLINE       0     0     0
        aacd2p4.eli          ONLINE       0     0     0
        aacd3p4.eli          ONLINE       0     0     0
        aacd4p4.eli          ONLINE       0     0     0
        9632703966287330955  UNAVAIL      0     0     0  was /dev/aacd5p4.eli
        aacd5p4.eli          ONLINE       0     0     0
        aacd6p4.eli          ONLINE       0     0     0

errors: No known data errors

Code:
# gpart show
=>       34  143155133  aacd0  GPT  (68G)
         34       1024      1  freebsd-boot  (512K)
       1058    4194304      2  freebsd-zfs  (2.0G)
    4195362    4194304      3  freebsd-swap  (2.0G)
    8389666  134765501      4  freebsd-zfs  (64G)

=>       34  143155133  aacd1  GPT  (68G)
         34       1024      1  freebsd-boot  (512K)
       1058    4194304      2  freebsd-zfs  (2.0G)
    4195362    4194304      3  freebsd-swap  (2.0G)
    8389666  134765501      4  freebsd-zfs  (64G)

=>       34  143155133  aacd2  GPT  (68G)
         34       1024      1  freebsd-boot  (512K)
       1058    4194304      2  freebsd-zfs  (2.0G)
    4195362    4194304      3  freebsd-swap  (2.0G)
    8389666  134765501      4  freebsd-zfs  (64G)

=>       34  143155133  aacd3  GPT  (68G)
         34       1024      1  freebsd-boot  (512K)
       1058    4194304      2  freebsd-zfs  (2.0G)
    4195362    4194304      3  freebsd-swap  (2.0G)
    8389666  134765501      4  freebsd-zfs  (64G)

=>       34  143155133  aacd4  GPT  (68G)
         34       1024      1  freebsd-boot  (512K)
       1058    4194304      2  freebsd-zfs  (2.0G)
    4195362    4194304      3  freebsd-swap  (2.0G)
    8389666  134765501      4  freebsd-zfs  (64G)

=>       34  143155133  aacd5  GPT  (68G)
         34       1024      1  freebsd-boot  (512K)
       1058    4194304      2  freebsd-zfs  (2.0G)
    4195362    4194304      3  freebsd-swap  (2.0G)
    8389666  134765501      4  freebsd-zfs  (64G)

=>       34  143155133  aacd6  GPT  (68G)
         34       1024      1  freebsd-boot  (512K)
       1058    4194304      2  freebsd-zfs  (2.0G)
    4195362    4194304      3  freebsd-swap  (2.0G)
    8389666  134765501      4  freebsd-zfs  (64G)

And the replacement drive (aacd5) is at

Code:
# egrep 'da[0-9]|cd[0-9]' /var/run/dmesg.boot
aacd0 on aac0
aacd0: 69900MB (143155200 sectors)
aacd1 on aac0
aacd1: 69900MB (143155200 sectors)
aacd2 on aac0
aacd2: 69900MB (143155200 sectors)
aacd3 on aac0
aacd3: 69900MB (143155200 sectors)
aacd4 on aac0
aacd4: 69900MB (143155200 sectors)
aacd5 on aac0
aacd5: 69900MB (143155200 sectors)
aacd6 on aac0
aacd6: 69900MB (143155200 sectors)
aacd7 on aac0
aacd7: 69900MB (143155200 sectors)
 
FWIW, using a hardware raid controller when you are using zfs is not recommended. zfs really wants (and needs) to have access to the raw disk devices / partitions themselves, not a facsimile of one created by a hardware raid controller.
 
Hi tingo,

Thanks for the replay - I really don't want to hijack my own thread, so please consider the following of secondary relevance: is there a good summary somewhere of why this would be true - that is what capabilities are blocked by the abstraction that aren't specifically replicated by the hardware itself?

Back to the original gist: I haven't had to replace a disk in this array in a few years, and I failed to properly document the process last time. Any advice on overall recommended sequence? zpool replace... vs. zpool detach.../ format.../ zpool attach.../ gpart bootcode...
 
FWIW, using a hardware raid controller when you are using zfs is not recommended. zfs really wants (and needs) to have access to the raw disk devices / partitions themselves, not a facsimile of one created by a hardware raid controller.
A lot of modern RAID cards allow you to put one or more disks in JBOD mode, which bypasses any RAID BIOS. Note that JBOD and a single disk RAID0 are not the same thing.
 
Thanks for the replay - I really don't want to hijack my own thread, so please consider the following of secondary relevance: is there a good summary somewhere of why this would be true - that is what capabilities are blocked by the abstraction that aren't specifically replicated by the hardware itself?

There are two distinct questions here. First: For the actual redundancy-based RAID functionality (mirroring or parity-based RAID, like RAID-5, -6, -Z, ...), should one use hardware RAID below ZFS, or leave the RAID functionality to ZFS? The answer is completely: leave it to ZFS. There is a variety of reasons that software RAID works better here. Just two examples: ZFS has checksums; with its own RAID, ZFS stores multiple copies of the checksums, and it can match checksum problems to individual disks, and use that to diagnose which disk is the problem. Second, when a disk fails and a replacement needs to be resilvered, ZFS only has to resolver those parts that are allocated in the file system; if the file system is 50% full, that means resilvering will be twice as fast. This has a direct impact on reliability (the MTTR enters into the MTDL linearly).

Second question: Given that one should not use hardware redundancy below ZFS (first paragraph above), should one use individual-disk RAID volumes using RAID0, or put the disk controller into JBOD mode? Here the answer is not as strong. Using RAID0 with individual disks adds complexity, management overhead, and the potential for making mistakes, while not giving any benefits that ZFS running on raw disks wouldn't also have. And using RAID here can have disadvantages, when the hardware implementation hides the real characteristics of a device. One example I've run into: a stupid RAID controller that pretends that all its virtual disks have 512 byte block size, even though the actual disks were "4096e" (4096 byte physical block size, but capable of emulating 512 byte logical blocks). ZFS is smart enough to detect such disks if it can determine their block size, and lay out its data structures to align with the physical blocks (for best performance and reliability); if that kind of stupid RAID controller is in the middle, ZFS might create a non-optimal data layout that requires lots of read-modify-write cycles at the hardware level, or runs the risk of torn writes.

Back to the original gist: I haven't had to replace a disk in this array in a few years, and I failed to properly document the process last time. Any advice on overall recommended sequence? zpool replace... vs. zpool detach.../ format.../ zpool attach.../ gpart bootcode...
The gpart bootcode I have no experience with, but clearly needs to be done.

On the ZFS part: If the old disk is still functioning (at least partially), I would attach the new disk first, then let ZFS do the resilvering, then detach the old disk. Obviously only if this is physically possible; with disk arrays there may be no place to put the disk, nor any way to connect it, in which case sadly the old disk has to come out before the new one goes in. Why do I say that? In the (possible but unlikely) case that another disk failure occurs during resilvering, the old disk might end up having the last copy of some data, and ZFS will know to read it from there if absolutely necessary.

I think zpool replace is nothing but syntactic sugar around a combination of attach and detach; you can just use the two individual commands.
 
Hi Sir Dice, Thanks also for the reply. Yes, I have some ZFS arrays on RAID cards that do just that, including my primary NAS box here on FreeNAS and a pair of TrueNAS arrays with 60+ 4TB disks each. But this one, alas, does not. Single disk RAID 0 permits RAIDZ2, which beats the hardware RAID6 for redundancy. My performance tests indicated that it was comparable to hardware RAID6 (this box is a ServeRAID 8k with battery-backed write cache).

I'm slightly more than passingly familiar with hardware configurations, but I have a system that I'm quite happy with but facing what I think should be a not atypical problem, given that this is a standard boot from ZFS config (aside from the hardware RAID controller): the almost as convenient as hardware RAID zpool replace or the slightly more involved zpool detach.... Nothing about the hardware configuration is complicating the process at this point.

Side note if anyone finds this searching for hardware advice: ZFS does a nice job of aggregating large numbers of disks in a software implementation of something similar to RAID. The problem is most MOBOs have limited numbers of drive channels and so, typically, one installs a drive interface card to do RAIDZ (including the highly redundant RAIDZ2). The standard solution is to use a drive interface card, which are typically designed for higher performance systems that would use hardware RAID rather than software RAID (as few OSes other than FreeBSD implement ZFS). ZFS has a different model that is generally advised to have direct access to the drives (the actual reasons for this are not generally discussed, but I'm sure there are good reasons). When using a hardware RAID controllers to present disks to ZFS, if it doesn't support JBOD mode (the ones that do are easy to find on FreeNAS support forums), one can configure the drives as single disk RAID0 using the RAID controllers software, usually in BIOS, to present each disk to ZFS at the OS level.

The above I have done. And it is good. But there's another step to replace the failed disk in the ZFS array. This is a pure ZFS issue and has nothing to do with the underlying hardware. It is, I think, a fairly typical install, aside from using RAIDZ rather than just a mirrored system, a question of replacing the disk with a simple zpool replace - which seems super convenient but the command takes only one pool directive and since I have both boot and root on the same physical array, I'm not sure if that works. The zpool detach process clearly supports fairly standard boot/swap/root on one ZFS cluster (¿term for the bunch-o-disks the pools are built on?), but there's a warning that mirrors are different than RAIDZ and... I'm just not sure if there's a simple block-level rebuild command that will recreate the missing disk in a less involved process or, indeed, if the detach process will destroy the pools if "misapplied" to a RAIDZ cluster, which would be a time consuming bummer.

Perhaps the process is to gpart format the replaced disk with the same partition layout then use the zpool replace command twice, once for each pool?
 
Hi ralphbsz, Thanks very much for the insightful reply. That's pretty consistent with my experiments. I'm pretty sure (but not positive) the ServeRAID 8k handles blocks well and it seems to do a fairly solid job, plus battery-backed write cache is a nice thing. I hadn't considered a controller that was mangling the hardware block sizes (and now, not entirely sure mine isn't) but as these are old SAS 2.5 72G drives, they're not going to be doing anything all that modern anyway.

But one definite detriment to hardware RAID controllers is they fail out a disk in their own way and once marked bad, you can't always (I can't anyway) present it to the OS as usable but flawed.

And while I have a lot of disks, I don't have extra channels, alas. 8 is enough, and all 8 are occupied.

For both of the above reasons, reslivering won't work in this case. It is configured RAIDZ2, and as the drives are pretty small, I'm not too worried about disk failure during rebuild. This is a more meaningful consideration in the work arrays with 6T (and soon 10TB) drives. When I get a chance, I'll have to look into how to do that. Thank you for that advice.

For now, this situation, disk replaced, but unformatted... gpart to match than apply the syntactic sugar of zpool replace for root and boot respectively?
 
You put your finger on two shortcomings of software RAID:

My performance tests indicated that it was comparable to hardware RAID6 (this box is a ServeRAID 8k with battery-backed write cache).
Many hardware RAID implementations (not all!) can use battery-backed write caches to mostly get rid of the read-modify-write penalty that's inherent in parity-based RAID (like RAID-5, -6, and all ZFS RAID-Z versions). Few software RAID implementations are capable of using battery-backed RAM caches when present in hardware, and I don't of any of those on FreeBSD. That may give hardware RAID a performance advantage. Various software RAID implementations (including the one in ZFS) use different techniques to minimize that impact; usually with pretty good results.

But there's another step to replace the failed disk in the ZFS array. This is a pure ZFS issue and has nothing to do with the underlying hardware. It is, I think, a fairly typical install, aside from using RAIDZ rather than just a mirrored system, a question of replacing the disk with a simple zpool replace - which seems super convenient but the command takes only one pool directive and since I have both boot and root on the same physical array, I'm not sure if that works.

If you are using hardware RAID, the controller knows the physical identity of disk drives. If one drive fails and needs to be replaced, the controller can orchestrate that replacement for the whole physical drive.

Now with software RAID: If you partition a physical drive into multiple OS partitions, and then use those partitions in multiple software RAID groups (like you have done, with a boot and a data pool), then you have a bit of a problem. When ZFS looks at the hardware, it sees two separate block devices, for example /dev/ada5p1 and /dev/ada5p3. It does not know a priori that those are in reality the same hardware, and that replacing one will necessarily also imply replacing the other. One could say that this is a shortcoming of software RAID. As a near-religious fanatic of software RAID, I would rather interpret this as follows: A user who partitions this way and then gives the partitions to ZFS RAID is using software RAID in an incorrect way. Or to quote an old joke: "Doctor, it hurts when I do this." "Well, then stop doing it." My personal preference would be that to create a single OS partition on each drive that's used by ZFS (the only reason for GPT partitions is to make it more manageable, by attaching symbolic names to the partitions and to make it clear what the disk is being used for), and then give that one partition to ZFS, and let it virtualize it as it wants.

Given that you have two partitions in two different pools, I think the best answer is to split the ZFS replace process into its constituent parts, and arrange them correctly. For example, if physical disk /dev/ada5 is sick (that means its partitions ada5p1 and ada5p3) and wants to be replaced, then first insert a new disk, partition it correctly into /dev/ada9p[13] (I used "ada9" to stand for "new" disk, since the word "nine" kind of sounds like "new", just for didactic reasons), then attach those two new partitions to the appropriate ZFS pools, let the resilver finish, and the detach the old two partitions. And if you have to pull the old disk out because of space/connection constraints, then do it in the opposite order, detach first and then attach.

And to answer a question you are asking: I think that the commands zpool attach and zpool detach work the same way on mirror and Z-based pools.
 
Ralphbsz, more awesome info, thanks!

I think, though I have yet to test this, that cache and other hardware RAID-based performance compensations become increasingly irrelevant with SSDs. I'm not there yet in my hardware replace cycle, alas, but I don't think I'll be doing this sort of legacy config much longer. (Still, I need to get this one working optimally). Also (though I haven't had a chance to fully test it), my TrueNAS boxes have SSD caches (L2/Arc) and that seemed like good enough advice, I sprung for it (not at home, there are some budgetary differences).

Two more questions - the most important is:

1) Note that zroot is geli encypted (aacd5p4.eli) and as I think through the commands for recreating that - I'm pretty sure geli is handled below ZFS and the keys would have to match etc. Is it possible to replace a geli-encrypted volume? I can't find anything on the process. (Encryption was set up by the installer).

and assuming there's an easy way past that one:
2) as "zpool attach zroot /dev/aacd0p4.eli /dev/aacd5p4.eli" doesn't seem right, and I'm not sure how to format the command to add a disk to an array (rather than a mirror).


I'd think it makes at least superficial sense to:

A] Partition the new drive as follows:

GPT scheme/partiion
gpart create -s gpt aacd5
Format to match
gpart add -b34 -a 512k -s 512K -t freebsd-boot aacd5
gpart add -a 512k -s 2G -t freebsd-zfs -l boot5 aacd5
gpart add -a 512k -s 2G -t freebsd-swap aacd5
gpart add -a 512k -t freebsd-zfs -l aacd5p4.eli aacd5
write the boot code to aacd5? The array boots and I've only every updated the bootcode on aacd0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 aacd5
Then zpool replace with
zpool replace bootpool boot5
zpool replace zroot aacd5p4.eli

buuut... geli init?

Code:
# gpart list -a
Geom name: aacd7
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 143155166
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: aacd7p1
   Mediasize: 524288 (512K)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 17408
   Mode: r0w0e0
   rawuuid: 62fdae30-efab-11e4-b96a-00145e5b9d0b
   rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
   label: gptboot7
   length: 524288
   offset: 17408
   type: freebsd-boot
   index: 1
   end: 1057
   start: 34
2. Name: aacd7p2
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 541696
   Mode: r1w1e2
   rawuuid: 630c093c-efab-11e4-b96a-00145e5b9d0b
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: boot7
   length: 2147483648
   offset: 541696
   type: freebsd-zfs
   index: 2
   end: 4195361
   start: 1058
3. Name: aacd7p3
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2148025344
   Mode: r1w1e0
   rawuuid: 631dd108-efab-11e4-b96a-00145e5b9d0b
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: swap7
   length: 2147483648
   offset: 2148025344
   type: freebsd-swap
   index: 3
   end: 8389665
   start: 4195362
4. Name: aacd7p4
   Mediasize: 68999936512 (64G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 541696
   Mode: r1w1e1
   rawuuid: 6352f36d-efab-11e4-b96a-00145e5b9d0b
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: zfs7
   length: 68999936512
   offset: 4295508992
   type: freebsd-zfs
   index: 4
   end: 143155166
   start: 8389666
Consumers:
1. Name: aacd7
   Mediasize: 73295462400 (68G)
   Sectorsize: 512
   Mode: r3w3e6
 
oh, probably inserting
# geli attach -k /boot/encryption.key /dev/aacd5p4
should create/recreate /dev/aacd5p4/dev/aacd5p4.eli then zpool replace

Any expert thoughts on this potentially catastrophic sequence, aside from "backup first!"
 
I think, though I have yet to test this, that cache and other hardware RAID-based performance compensations become increasingly irrelevant with SSDs. ... my TrueNAS boxes have SSD caches (L2/Arc) and that seemed like good enough advice, ...

It gets very complicated when optimizing RAID performance, in particular performance per $$$. Clearly, a pure-SSD based RAID array will have different needs for fast cache. Clearly, adding SSD to a hard-disk-based RAID array will help too. ZFS allows both using L2ARC and ZIL. I have not worked enough with ZFS to have intuition for what is best. And the answer will usually be highly workload-dependent.

1) Note that zroot is geli encypted (aacd5p4.eli) and as I think through the commands for recreating that - I'm pretty sure geli is handled below ZFS and the keys would have to match etc. Is it possible to replace a geli-encrypted volume?

Ouch. Never worked with geli myself. I would think that before you give the new disk to ZFS to attach (or replace onto), you have to do all the deli magic to make the disk fully usable. I think ZFS has no idea that the underlying block devices XXX.eli it is given is encrypted or unusual, and it won't help you with key management (but also not get in the way).
 
Well... fingers crossed none of these commands will hose the rest of the array - I'll try the sequence above and update with results.
 
If it wasn't encrypted, I'd be done, but progress so far if you are on hardware RAID and have a completely crashed disk or otherwise can't add a new drive to the array but instead have to replace a failed drive with a new one and then rebuild the array:

1) replace the drive with a working one.
2) RAID controller will likely have a BIOS level mechanism for adding the drive to the system, in my case the steps were
a) format the new drive (maybe unnecessary but can uncover any problems with the new drive)
b) initialize the drive
c) add it to a single volume RAID array to present it to the OS. With the Adaptec/ServeRAID 8k, this will show up with an aacd prefix. After booting, you should be able to find it with egrep 'aac[0-9]' /var/run/dmesg.boot

Use the following commands and note the configuration of the existing system (see above for mine).
#pool status (note the status)
gpart show (note the start and size of each block)
gpart list -a (note the labels)
(and for encryption config geli list - note the encryption parameters)

Write GPT scheme to the disk (I'm replace aacd5):
gpart create -s gpt aacd5
Then create the partitions - I experimented with a couple of options until the labels and parameters matched. You can gpart destroy -i4 aacd5 to delete, for example, the 4th partition if you make a mistake. I did.)
gpart add -s 512K -t freebsd-boot -l gptboot5 aacd5
gpart add -s 2G -t freebsd-zfs -l boot5 aacd5
gpart add -s 2G -t freebsd-swap -l swap5 aacd5
gpart add -t freebsd-zfs -l zfs5 aacd5

Then rebuild the zpool (note my zroot is encrypted, see notes below, but bootpool isn't, so the following steps are sufficient)
zpool replace bootpool /dev/gpt/boot5

After this:
zpool status
Code:
  pool: bootpool
state: ONLINE
  scan: resilvered 153M in 0h0m with 0 errors on Mon Mar 19 18:23:20 2018
config:

    NAME           STATE     READ WRITE CKSUM
    bootpool       ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        gpt/boot0  ONLINE       0     0     0
        gpt/boot1  ONLINE       0     0     0
        gpt/boot2  ONLINE       0     0     0
        gpt/boot3  ONLINE       0     0     0
        gpt/boot4  ONLINE       0     0     0
        gpt/boot5  ONLINE       0     0     0
        gpt/boot6  ONLINE       0     0     0
        gpt/boot7  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 784K in 0h0m with 0 errors on Mon Mar  5 12:56:11 2018
config:

    NAME                     STATE     READ WRITE CKSUM
    zroot                    DEGRADED     0     0     0
      raidz2-0               DEGRADED     0     0     0
        aacd0p4.eli          ONLINE       0     0     0
        aacd1p4.eli          ONLINE       0     0     0
        aacd2p4.eli          ONLINE       0     0     0
        aacd3p4.eli          ONLINE       0     0     0
        aacd4p4.eli          ONLINE       0     0     0
        9632703966287330955  UNAVAIL      0     0     0  was /dev/aacd5p4.eli
        aacd6p4.eli          ONLINE       0     0     0
        aacd7p4.eli          ONLINE       0     0     0

:p

well maybe more :confused: because.... there's still the tricky crypto bit. geli list shows me that my other partitions are encrypted as:
Code:
Geom name: aacd4p4.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 256
Crypto: software
Version: 7
UsedKey: 0
Flags: BOOT
KeysAllocated: 17
KeysTotal: 17
Providers:
1. Name: aacd4p4.eli
   Mediasize: 68999933952 (64G)
   Sectorsize: 4096
   Mode: r1w1e1
Consumers:
1. Name: aacd4p4
   Mediasize: 68999936512 (64G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 541696
   Mode: r1w1e1

The following recreates the encrypted partion using the same encryption parameters as used during the installer.
geli init -b -e AES-XTS -l 256 -K /boot/encryption.key -s 4096 /dev/aacd5p4
geli attach -k /boot/encryption.key /dev/aacd5p4
zpool replace zroot aacd5p4.eli
I'm not sure if reinstalling the bootcode is required as in this case aacd0 didn't fail, but better safe than sorry.
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 aacd0

Resilvering is going, 3h27m to go.
 
Last edited:
Reboot, single passphrase entry (as before), and:

Code:
# zpool status
  pool: bootpool
 state: ONLINE
  scan: resilvered 153M in 0h0m with 0 errors on Mon Mar 19 18:23:20 2018
config:

    NAME           STATE     READ WRITE CKSUM
    bootpool       ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        gpt/boot0  ONLINE       0     0     0
        gpt/boot1  ONLINE       0     0     0
        gpt/boot2  ONLINE       0     0     0
        gpt/boot3  ONLINE       0     0     0
        gpt/boot4  ONLINE       0     0     0
        gpt/boot5  ONLINE       0     0     0
        gpt/boot6  ONLINE       0     0     0
        gpt/boot7  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: resilvered 25.8G in 2h35m with 0 errors on Tue Mar 20 01:40:11 2018
config:

    NAME             STATE     READ WRITE CKSUM
    zroot            ONLINE       0     0     0
      raidz2-0       ONLINE       0     0     0
        aacd0p4.eli  ONLINE       0     0     0
        aacd1p4.eli  ONLINE       0     0     0
        aacd2p4.eli  ONLINE       0     0     0
        aacd3p4.eli  ONLINE       0     0     0
        aacd4p4.eli  ONLINE       0     0     0
        aacd5p4.eli  ONLINE       0     0     0
        aacd6p4.eli  ONLINE       0     0     0
        aacd7p4.eli  ONLINE       0     0     0

errors: No known data errors

:p
 
If it wasn't encrypted, I'd be done, but progress so far if you are on hardware RAID and have a completely crashed disk or otherwise can't add a new drive to the array but instead have to replace a failed drive with a new one and then rebuild the array:

1) replace the drive with a working one.
2) RAID controller will likely have a BIOS level mechanism for adding the drive to the system, in my case the steps were
a) format the new drive (maybe unnecessary but can uncover any problems with the new drive)
b) initialize the drive
c) add it to a single volume RAID array to present it to the OS. With the Adaptec/ServeRAID 8k, this will show up with an aacd prefix. After booting, you should be able to find it with egrep 'aac[0-9]' /var/run/dmesg.boot

Use the following commands and note the configuration of the existing system (see above for mine).
#pool status (note the status)
gpart show (note the start and size of each block)
gpart list -a (note the labels)
(and for encryption config geli list - note the encryption parameters)

Write GPT scheme to the disk (I'm replace aacd5):
gpart create -s gpt aacd5
Then create the partitions - I experimented with a couple of options until the labels and parameters matched. You can gpart destroy -i4 aacd5 to delete, for example, the 4th partition if you make a mistake. I did.)
gpart add -s 512K -t freebsd-boot -l gptboot5 aacd5
gpart add -s 2G -t freebsd-zfs -l boot5 aacd5
gpart add -s 2G -t freebsd-swap -l swap5 aacd5
gpart add -t freebsd-zfs -l zfs5 aacd5

Then rebuild the zpool (note my zroot is encrypted, see notes below, but bootpool isn't, so the following steps are sufficient)
zpool replace bootpool /dev/gpt/boot5

[...snip...]

The following recreates the encrypted partion using the same encryption parameters as used during the installer.
geli init -b -e AES-XTS -l 256 -K /boot/encryption.key -s 4096 /dev/aacd5p4
geli attach -k /boot/encryption.key /dev/aacd5p4
zpool replace zroot aacd5p4.eli
I'm not sure if reinstalling the bootcode is required as in this case aacd0 didn't fail, but better safe than sorry.
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 aacd0

Resilvering is going, 3h27m to go.

Hi gessel,
Thank you so much for your documentation of the steps necessary to fix a geli-encrpyted zpool. I was already wondering what steps I possibly needed to go, as I used the FreeBSD installer to setup the encryption. As the FreeBSD Handbook shows Geli encryption and ZFS setup independently I was not sure how to combine those two.
Very nice work.
 
Back
Top