ZFS How to extend zfs geli encrypted disk? Space not showing

mer · Sep 16, 2022

Tracker said:
Although I'm still not sure why after destroying the 24.7G boot environment the disk space isn't reflecting the freed up space.

"Clones and snapshots".
It's simply "something", a snapshot or another clone still holds references to the blocks.
doing beadm destroy and saying "no" when it asks if you want to delete the origin, I think just deletes the clone, but the underlying snapshot still exists, so the space won't get freed until there is nothing left referencing that snapshot.

If you have multiple boot environments that all reference the same snapshot, I think if you answer "y" on delete the origin, it won't delete the snapshot until all the clones are deleted first, so it may tell you "I can't because something else uses it"
The output of the zfs destroy -nv command will give a really good clue.

I'm going to leave VladiBG in charge of the GELI bits (not my area of expertise), so I'm trying to help you understand how the boot environments, the underlying snapshots and clones all fit together.

VladiBG · Sep 16, 2022

use bectl(8) to delete the old snapshots which are made by freebsd-update don't use zfs to manage them. Anyway the topic was for expanding the disk not for managing the snapshots.

mer · Sep 16, 2022

VladiBG said:
use bectl(8) to delete the old snapshots which are made by freebsd-update don't use zfs to manage them.

Yep I'm telling him to use the zfs destroy -nv to see the relationship between the snapshot he's concerned about. "-n" says "dry run, don't actually do anything", "v" verbose to have the command tell you what it would do.
It will help understand the relationship between the snapshots and the boot environments.

But yes to actually delete the boot environments use bectl or beadm, make sure you do bectl destroy -o or say yes on beadm destroy when it asks about destroying the origin.

T-Daemon · Sep 16, 2022

VladiBG said:
read PMc post #5 about resizing the geli first.

VladiBG said:
to summarize:

BACKUP first
gpart recover ada0
gpart resize -i 3 -a 4k -s 450G ada0
zpool set autoexpand=on zroot
zpool online -e ada0p3.eli
zpool list

Careful about the sequence of actions to be taken. The above as is would make the geli metadata inaccessible and the pool would be gone.

Also don't use geli rezise, it may not work as expected, the pool would be lost, better use geli backup and geli restore.

The correct order of actions are:

Backup
Boot a FreeBSD installation media, drop to "Live CD"
gpart recover [-f flags] geom
geli backup [-v] prov file # backup the file on the installation media, under /tmp. Careful here, when the media is rebooted the saved metadata in the file gets lost.
gpart resize -i index [-a alignment] [-s size] [-f flags] geom
geli restore [-fv] file prov
Boot geli encrypted system
Resize pool: zpool online [-e] pool device

VladiBG · Sep 16, 2022

Do not restore the old metadata of the geli after resizing the provider. It will corrupt it as the metadata hold the old provider size. The geli has autoresize flag which will handle the new provider size after gpart resize command. If you want test it inside a VM to see.

-f Metadata contains the size of the provider to ensure that
the correct partition or slice is attached

Tracker · Sep 16, 2022

mer said:
If you have multiple boot environments that all reference the same snapshot, I think if you answer "y" on delete the origin, it won't delete the snapshot until all the clones are deleted first, so it may tell you "I can't because something else uses it"
The output of the zfs destroy -nv command will give a really good clue.

Do you mean that choosing 'n' here was the wrong choice? It didn't end up freeing space but seemed to have successfully destroyed the earlier boot environment.

VladiBG said:
BACKUP first

I actually don't have another backup disk handy :/ Is it not relatively safe to proceed?

VladiBG said:
18.3. Resizing and Growing Disks
read PMc post #5 about resizing the geli first.

Will try to read this in a bit. Been overwhelming.

VladiBG said:
I suggest you to make a full backup. Then gpart recover ada0 and see if you see the free space at the end of the disk. If you see the free space after ada0p3 then you can resize it gpart resize -i 3 -a 4k -s 450G ada0 and then set zpool set autoexpand=on zroot followed by zpool online -e ada0p3.eli . Then verify the pool size via zpool list

Ok - I'll try 'gpart recover ada0`

VladiBG said:
Note: Your geli partition has AUTORESIZE flag so it should pickup the new partition size automatically

Thanks - how does one check this? Should be something to keep handy in my mind for next time (I hope not)

Tracker · Sep 16, 2022

T-Daemon said:
Careful about the sequence of actions to be taken. The above as is would make the geli metadata inaccessible and the pool would be gone.

Also don't use geli rezise, it may not work as expected, the pool would be lost, better use geli backup and geli restore.

The correct order of actions are:

Backup

Boot a FreeBSD installation media, drop to "Live CD"

gpart recover [-f flags] geom

geli backup [-v] prov file # backup the file on the installation media, under /tmp. Careful here, when the media is rebooted the saved metadata in the file gets lost.

gpart resize -i index [-a alignment] [-s size] [-f flags] geom

geli restore [-fv] file prov

Boot geli encrypted system

Resize pool: zpool online [-e] pool device

So ideally I should be doing this via a FreeBSD installation media rather than on a live system? Is there a safe way to do this without backing up?

mer · Sep 16, 2022

Tracker said:
Do you mean that choosing 'n' here was the wrong choice? It didn't end up freeing space but seemed to have successfully destroyed the earlier boot environment.

It destroyed the clone part of the earlier boot environment, but not the snapshot the clone was based on.
I don't think "n" was the wrong choice; I think (because it's been a long time since I've been in your situation with lots of BEs) that even if you said "y" it may not have done anything, it may have come back with "I can't remove the origin because there are other things depending on it".

T-Daemon · Sep 16, 2022

VladiBG said:
Do not restore the old metadata of the geli after resizing the provider. It will corrupt it as the metadata hold the old provider size.

geli restore [-fv] file prov works for me when executed on a test system. The provider attaches as expected afterwards.

VladiBG said:
The geli has autoresize flag which will handle the new provider size after gpart resize command.

You are right, no geli resize needed.

VladiBG said:
If you want test it inside a VM to see.

I've tested (again) in a VirtualBox VM, this time from the booted encrypted live system, not a installation media. All it takes to resize the geli(8) provider and the pool from the mounted system is as described by you earlier:

gpart recover [-f flags] geom
gpart resize -i index [-a alignment] [-s size] [-f flags] geom
zpool online [-e] pool device

No need for extra steps as mentioned in my post # 29.

T-Daemon · Sep 16, 2022

Tracker, follow VladiBG's instructions from post # 25. "autoexpand" of the pool should be set already.

Tracker said:
So ideally I should be doing this via a FreeBSD installation media rather than on a live system?

You can execute the commands from the live system.

Tracker said:
Is there a safe way to do this without backing up?

The risk is yours, if the data is of any value I would backup.

Tracker · Sep 16, 2022

mer said:
It destroyed the clone part of the earlier boot environment, but not the snapshot the clone was based on.
I don't think "n" was the wrong choice; I think (because it's been a long time since I've been in your situation with lots of BEs) that even if you said "y" it may not have done anything, it may have come back with "I can't remove the origin because there are other things depending on it".

So I did go ahead and destroy even the current BE it was pointing to by saying 'y' - it said successfully destroyed (No warning was given!) - but again df -h doesn't return any change is space :/

df -h
Filesystem Size Used Avail Capacity Mounted on
zroot/ROOT/13.1-RELEASE-p2_2022-09-10_232433 26G 23G 2.8G 89% /
----snip----

Destoyed here

sudo beadm destroy 12.3-RELEASE-p5_2022-07-01_212910
Password:
Are you sure you want to destroy '12.3-RELEASE-p5_2022-07-01_212910'?
This action cannot be undone (y/[n]): y
Boot environment '12.3-RELEASE-p5_2022-07-01_212910' was created from existing snapshot
Destroy '12.3-RELEASE-p5_2022-09-10_190846@2022-09-10-19:08:46-0' snapshot? (y/[n]): y
Destroyed successfully

Somehow it still shows up when I list it

beadm list
BE Active Mountpoint Space Created
12.3-RELEASE-p1_2022-03-18_164224 - - 267.0M 2022-03-18 16:42
12.3-RELEASE-p3_2022-03-23_175807 - - 83.7M 2022-03-23 17:58
12.3-RELEASE-p4_2022-04-06_232036 - - 656.0M 2022-04-06 23:20
13.0-RELEASE-p11_2022-07-01_213226 - - 90.0M 2022-07-01 21:32
12.3-RELEASE-p5_2022-08-10_011525 - - 610.0M 2022-08-10 01:15
12.3-RELEASE-p6_2022-09-03_171127 - - 525.0M 2022-09-03 17:11
12.3-RELEASE-p5_2022-09-10_190846 - - 8.0G 2022-09-10 19:08
12.3-to-13.1 - - 1.4M 2022-09-10 19:12
12.3-RELEASE-p7_2022-09-10_230907 - - 1.4M 2022-09-10 23:09
13.1-RELEASE-p2_2022-09-10_231247 - - 5.9M 2022-09-10 23:12
13.1-RELEASE-p2_2022-09-10_232433 NR / 100.8G 2022-09-10 23:24 #this line
13.1-RELEASE-p2_2022-09-11_220401 - - 36.6M 2022-09-11 22:04

disk usage same as above - No change barely and the current BE still in tact it seems. This is just weird.

df -h
Filesystem Size Used Avail Capacity Mounted on
zroot/ROOT/13.1-RELEASE-p2_2022-09-10_232433 26G 23G 3.1G 88% /
----snip-----

Just to be safe I created another BE in case some weird thing happens upon restarting the machine. Yet to point to it though.
This is just very non-intuitive from a user perspective.

mer · Sep 16, 2022

Tracker said:
This is just very non-intuitive from a user perspective.

Yep, not going to argue that, but you need to basically beadm destroy all the boot environments you don't need. I would keep the current one you are in, maybe the previous one. Everything else get rid of it. That's the thing about snapshots, you need to delete everything that references it before the space winds up being reclaimed. Until then, the space just "moves".

When they are boot environments, there is always a bit of fear when getting rid of them.
But: how long have you been running on the one that is currently active? When did you last reboot into a previous one?
You should be safe just destroying all the boot environments up to the current one, saying "y" to getting rid of the origin. Eventually the space will come back.

mer · Sep 16, 2022

Be careful what you are looking at. This bit of what you did:

sudo beadm destroy 12.3-RELEASE-p5_2022-07-01_212910
Password:
Are you sure you want to destroy '12.3-RELEASE-p5_2022-07-01_212910'?
This action cannot be undone (y/[n]): y
Boot environment '12.3-RELEASE-p5_2022-07-01_212910' was created from existing snapshot
Destroy '12.3-RELEASE-p5_2022-09-10_190846@2022-09-10-19:08:46-0' snapshot? (y/[n]): y
Destroyed successfully

You deleted the BE named 12.3-RELEASE-p5_2022-07-01_212910 which was based off the snapshot named
12.3-RELEASE-p5_2022-09-10_190846@2022-09-10-19:08:46-0

Your bectl list after that shows the Boot Environment named 12.3-RELEASE-p5_2022-09-10_190846 still exists, which is correct. The snapshot 12.3-RELEASE-p5_2022-09-10_190846@2022-09-10-19:08:46-0 is not the same as the BE named 12.3-RELEASE-p5_2022-09-10_190846.

Tracker · Sep 16, 2022

mer said:
Be careful what you are looking at. This bit of what you did:

You're right - I thought I did something else. I need to take a break and get back to this.

Tracker · Sep 16, 2022

Uh oh - Ok I went ahead and tried it and it seems to have destroyed the existing BE!

sudo beadm destroy 12.3-RELEASE-p5_2022-09-10_190846
Password:
Are you sure you want to destroy '12.3-RELEASE-p5_2022-09-10_190846'?
This action cannot be undone (y/[n]): y
Boot environment '12.3-RELEASE-p5_2022-09-10_190846' was created from existing snapshot
Destroy '13.1-RELEASE-p2_2022-09-10_232433@2022-07-01-21:29:10-0' snapshot? (y/[n]): y
Destroyed successfully

But it seems to be showing up again! Just as I mistakenly described above

beadm list
BE Active Mountpoint Space Created
12.3-RELEASE-p1_2022-03-18_164224 - - 267.0M 2022-03-18 16:42
12.3-RELEASE-p3_2022-03-23_175807 - - 83.7M 2022-03-23 17:58
12.3-RELEASE-p4_2022-04-06_232036 - - 656.0M 2022-04-06 23:20
13.0-RELEASE-p11_2022-07-01_213226 - - 4.3G 2022-07-01 21:32
12.3-RELEASE-p5_2022-08-10_011525 - - 610.0M 2022-08-10 01:15
12.3-RELEASE-p6_2022-09-03_171127 - - 525.0M 2022-09-03 17:11
12.3-to-13.1 - - 1.4M 2022-09-10 19:12
12.3-RELEASE-p7_2022-09-10_230907 - - 1.4M 2022-09-10 23:09
13.1-RELEASE-p2_2022-09-10_231247 - - 5.9M 2022-09-10 23:12
13.1-RELEASE-p2_2022-09-10_232433 NR / 100.8G 2022-09-10 23:24
13.1-RELEASE-p2_2022-09-11_220401 - - 36.6M 2022-09-11 22:04
13.1-p2-after-destroying - - 452.0K 2022-09-16 19:26

This time however the disk space has been retrieved! This is soo strange - the current BE that got destroyed is still showing, but the space has been freed up.

df -h
Filesystem Size Used Avail Capacity Mounted on
zroot/ROOT/13.1-RELEASE-p2_2022-09-10_232433 34G 23G 11G 68% /
--snip--

PS: The current BE is a fresh one that I pointed to after the latest upgrade, I don't think I've restarted the machine after that. Now I'm a bit scared to - but will do it soon.

VladiBG · Sep 16, 2022

Tracker said:
I actually don't have another backup disk handy :/ Is it not relatively safe to proceed?

There's always good idea to have regular backups of your information especially when you are using SSD.
It's never safe when you are modify partition table as a simple mistype can damage your data. So try always to be on the safe side.

mer · Sep 16, 2022

Tracker said:
But it seems to be showing up again! Just as I mistakenly described above

what "it" are you referring to?
You did this:
beadm destroy 12.3-RELEASE-p5_2022-09-10_190846
The subsequent bectl list does not show that boot environment.
That BE was based on a snapshot of 13.1-RELEASE-p2_2022-09-10_232433, not on 13.1-RELEASE-p2_2022-09-10_232433 itself.

Space coming back means that destroying the BE named 12.3-RELEASE-p5_2022-09-10_190846 actually freed ZFS blocks.
So now beadm destroy, say "y" to the remove origin and see what happens
12.3-RELEASE-p1_2022-03-18_164224
12.3-RELEASE-p3_2022-03-23_175807
12.3-RELEASE-p4_2022-04-06_232036
13.0-RELEASE-p11_2022-07-01_213226
12.3-RELEASE-p5_2022-08-10_011525
12.3-RELEASE-p6_2022-09-03_171127
12.3-to-13.1
12.3-RELEASE-p7_2022-09-10_230907

Tracker · Sep 16, 2022

mer said:
what "it" are you referring to?

Please read this carefully my previous reply - it mentions the current BE being destroyed

mer said:
So now beadm destroy, say "y" to the remove origin and see what happens

That's what I did - and it said it successfully destroyed it yet it shows up in the list

mer · Sep 16, 2022

Tracker said:
That's what I did - and it said it successfully destroyed it yet it shows up in the list

Highlight for me what you are talking about. Because I'm looking at the output and I don't see what you are talking about.
Cut and paste from your post:

sudo beadm destroy 12.3-RELEASE-p5_2022-09-10_190846
Password:
Are you sure you want to destroy '12.3-RELEASE-p5_2022-09-10_190846'?
This action cannot be undone (y/[n]): y
Boot environment '12.3-RELEASE-p5_2022-09-10_190846' was created from existing snapshot
Destroy '13.1-RELEASE-p2_2022-09-10_232433@2022-07-01-21:29:10-0' snapshot? (y/[n]): y
Destroyed successfully

But it seems to be showing up again! Just as I mistakenly described above

beadm list
BE Active Mountpoint Space Created
12.3-RELEASE-p1_2022-03-18_164224 - - 267.0M 2022-03-18 16:42
12.3-RELEASE-p3_2022-03-23_175807 - - 83.7M 2022-03-23 17:58
12.3-RELEASE-p4_2022-04-06_232036 - - 656.0M 2022-04-06 23:20
13.0-RELEASE-p11_2022-07-01_213226 - - 4.3G 2022-07-01 21:32
12.3-RELEASE-p5_2022-08-10_011525 - - 610.0M 2022-08-10 01:15
12.3-RELEASE-p6_2022-09-03_171127 - - 525.0M 2022-09-03 17:11
12.3-to-13.1 - - 1.4M 2022-09-10 19:12
12.3-RELEASE-p7_2022-09-10_230907 - - 1.4M 2022-09-10 23:09
13.1-RELEASE-p2_2022-09-10_231247 - - 5.9M 2022-09-10 23:12
13.1-RELEASE-p2_2022-09-10_232433 NR / 100.8G 2022-09-10 23:24
13.1-RELEASE-p2_2022-09-11_220401 - - 36.6M 2022-09-11 22:04

I don't see 12.3-RELEASE-p5_2022-09-10_190846 in the above list. I see 13.1-RELEASE-p2_2022-09-10_232433 in the list because you have not deleted it. You have deleted 13.1-RELEASE-p2_2022-09-10_232433@2022-07-01-21:29:10-0 which is a snapshot, which is different from 13.1-RELEASE-p2_2022-09-10_23243.

Tracker · Sep 16, 2022

mer said:
I don't see 12.3-RELEASE-p5_2022-09-10_190846 in the above list. I see 13.1-RELEASE-p2_2022-09-10_232433 in the list because you have not deleted it. You have deleted 13.1-RELEASE-p2_2022-09-10_232433@2022-07-01-21:29:10-0 which is a snapshot, which is different from 13.1-RELEASE-p2_2022-09-10_23243.

I see - that's correct. The difference between snapshot and BE can be a bit confusing. I presumed it was the same when it asked me to delete this way

Boot environment '12.3-RELEASE-p5_2022-09-10_190846' was created from existing snapshot
Destroy '13.1-RELEASE-p2_2022-09-10_232433@2022-07-01-21:29:10-0' snapshot? (y/[n]): y
Destroyed successfully

So if I understand correctly the first time it's asking me to delete the BE and the next time it specifies which snapshot the BE is based on and whether I want to delete it or not ?

mer · Sep 16, 2022

Tracker said:
So if I understand correctly the first time it's asking me to delete the BE and the next time it specifies which snapshot the BE is based on and whether I want to delete it or not

Yes! That is the connection you needed to make. Confusing? heck yeah. But once you make the connection, it's obvious. took me a while too.

That's why I was suggesting the zfs destroy -nv of the snapshot. It would have walked through and told you what it would/would not delete. First time I did that the dots all connected for me.

Tracker · Sep 16, 2022

In another good news - I can see the space at the end of the disk! ? - 168G - wohooo - now need to carefully expand it

sudo gpart recover ada0
Password:
ada0 recovered

gpart show ada0
=> 40 976773088 ada0 GPT (466G)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 16777216 2 freebsd-swap (8.0G)
16779264 608362496 3 freebsd-zfs (290G)
625141760 351631368 - free - (168G)

mer · Sep 16, 2022

Awesome! For that, follow what others have said about GELI, resize order of operations. Kind of neat when all the pieces start falling into place.

Tracker · Sep 16, 2022

T-Daemon said:
You can execute the commands from the live system.

I was reading this in the handbook on disks and it mentions to use the following setting on a live system - Is this applicable to my case of zfs + geli ?

# sysctl kern.geom.debugflags=16

Also the autoexpand isn't automatically set to on

zpool get autoexpand
NAME PROPERTY VALUE SOURCE
zroot autoexpand off default

T-Daemon · Sep 17, 2022

T-Daemon said:
You can execute the commands from the live system.

Tracker said:
I was reading this in the handbook on disks and it mentions to use the following setting on a live system - Is this applicable to my case of zfs + geli ?

# sysctl kern.geom.debugflags=16

I can't tell for sure, the handbook is referring to UFS2. If not someone here in forums can answer the question explicitly, maybe somebody on the freebsd-fs@ mailing list can. That list is more frequented by FreeBSD filesystem developers than forums.

Anyway, the kernel state would apply only to the ZFS. The geli(8) provider is acting here only as a container.

Be advised, if you are thinking to operate from a FreeBSD installation media, the instructions from VladiBG won't work. Those work from the mounted system only.

If a installation media is used, use the instructions I gave in post # 29.

Tracker said:
Also the autoexpand isn't automatically set to on

zpool get autoexpand
NAME PROPERTY VALUE SOURCE
zroot autoexpand off default

I assumed it's on when testing, but actually it isn't needed to be set to expand the size of the pool after resizing the partition. Executing zpool online [-e] pool device alone is enough.

From zpoolprops(7):

Rich (BB code):

DESCRIPTION
     ...
     expandsize        Amount of uninitialized space within the pool or device
                       that can be used to increase the total capacity of the
                       pool.  On whole-disk vdevs, this is the space beyond
                       the end of the GPT – typically occurring when a LUN is
                       dynamically expanded or a disk replaced with a larger
                       one.  On partition vdevs, this is the space appended to
                       the partition after it was added to the pool – most
                       likely by resizing it in-place.  The space can be
                       claimed for the pool by bringing it online with
                       autoexpand=on or using zpool online -e.

Again, it would be best to make a backup before resizing the geli(8) provider (partition).

Alternatively, if no backup device is available at the moment, don't resize the geli(8) provider, create in the free space a separate (geli(8) encrypted) partition with another ZFS pool inside and add it to the system.