Solved zpool missing after increasing disk size (AWS)

dvl@

Developer
FreshPorts runs in an AWS instance. Earlier today, I modified two storage devices from 200GB to 250GB. The host contains two zpools: zroot and data01.

The zroot update went fine. The data01 zpool just disappeared. The drive is still there, but the zpool cannot be seen. I'm sure this can be recovered, but I don't know how.

Details I have collected:


[15:27 aws-1 dan ~] % zpool status data01
cannot open 'data01': no such pool


[15:37 aws-1 dan ~] % sudo zpool import
pool: data01
id: 17238602793760673894
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
config:

data01 UNAVAIL insufficient replicas
nda2p1 UNAVAIL cannot open

[15:38 aws-1 dan ~] % ls -l /dev/nda2p1
crw-r----- 1 root operator 0x66 2025.08.25 15:22 /dev/nda2p1
[15:39 aws-1 dan ~] % gpart show nda2p1
gpart: No such geom: nda2p1.
[15:39 aws-1 dan ~] % gpart show nda2
=> 40 524287920 nda2 GPT (250G)
40 524287920 1 freebsd-zfs (250G)

[15:39 aws-1 dan ~] %




[15:30 aws-1 dan ~] % gpart show
=> 40 524287920 nda0 GPT (250G)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 524285912 2 freebsd-zfs (250G)

=> 40 16777136 nda1 GPT (8.0G)
40 16777136 1 freebsd-swap (8.0G)

=> 40 524287920 nda2 GPT (250G)
40 524287920 1 freebsd-zfs (250G)


[15:31 aws-1 dan ~] % zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 250G 36.2G 213G - - 28% 14% 1.00x ONLINE -

From /var/log/daily.log I know there is a data01:

Backup of boot partition content:
nda0p1

Disk status:
Filesystem Size Used Avail Capacity Mounted on
zroot/ROOT/default 174G 17G 157G 10% /
devfs 1.0K 0B 1.0K 0% /dev
data01/jails 33G 96K 33G 0% /jails
zroot/tmp 157G 313K 157G 0% /tmp
zroot/var/mail 157G 24K 157G 0% /var/mail
zroot/usr/home 171G 14G 157G 8% /usr/home
zroot/usr/src 157G 23K 157G 0% /usr/src
zroot/var/crash 157G 23K 157G 0% /var/crash
zroot/mkjail 157G 23K 157G 0% /mkjail
zroot/usr/ports 157G 23K 157G 0% /usr/ports
zroot/freebsd_releases 160G 3.3G 157G 2% /var/db/mkjail
zroot/var/audit 157G 23K 157G 0% /var/audit
zroot/var/tmp 157G 278K 157G 0% /var/tmp
zroot/var/log 157G 61M 157G 0% /var/log
data01/jails/ingress01 44G 11G 33G 25% /jails/ingress01
data01/jails/nginx01 37G 4.6G 33G 12% /jails/nginx01
zroot/mkjail 157G 23K 157G 0% /mkjail
data01/rsyncer 40G 7.1G 33G 18% /usr/home/rsyncer/backups
data01/jails/ingress01/usr/src 33G 96K 33G 0% /jails/ingress01/usr/src
data01/freshports/ingress01/var/db/freshports 34G 1.7G 33G 5% /jails/ingress01/var/db/freshports
data01/mkjail/14.2-RELEASE 34G 852M 33G 2% /mkjail/14.2-RELEASE
data01/freshports/ingress01/var/db/ingress 33G 180K 33G 0% /jails/ingress01/var/db/ingress
data01/freshports/ingress01/var/db/freshports/cache 33G 96K 33G 0% /jails/ingress01/var/db/freshports/cache
data01/freshports/ingress01/var/db/freshports/message-queues 37G 4.4G 33G 12% /jails/ingress01/var/db/freshports/message-queues
data01/freshports/ingress01/var/db/ingress/message-queues 33G 1.2M 33G 0% /jails/ingress01/var/db/ingress/message-queues
data01/freshports/ingress01/var/db/ingress/repos 42G 9.2G 33G 22% /jails/ingress01/var/db/ingress/repos
data01/freshports/ingress01/var/db/freshports/cache/spooling 33G 360K 33G 0% /jails/ingress01/var/db/freshports/cache/spooling
data01/freshports/ingress01/var/db/freshports/cache/html 33G 204K 33G 0% /jails/ingress01/var/db/freshports/cache/html
devfs 1.0K 0B 1.0K 0% /jails/ingress01/dev
data01/freshports/jailed/ingress01/jails 33G 104K 33G 0% /jails/ingress01/jails
data01/freshports/jailed/ingress01/mkjail 35G 1.8G 33G 5% /jails/ingress01/var/db/mkjail
data01/freshports/jailed/ingress01/jails/freshports 117G 85G 33G 72% /jails/ingress01/jails/freshports
data01/freshports/jailed/ingress01/mkjail/14.2-RELEASE 34G 852M 33G 2% /jails/ingress01/var/db/mkjail/14.2-RELEASE
devfs 1.0K 0B 1.0K 0% /jails/ingress01/jails/freshports/dev
/jails/ingress01/var/db/freshports/cache/html 33G 204K 33G 0% /jails/nginx01/var/db/freshports/cache/html
devfs 1.0K 0B 1.0K 0% /jails/nginx01/dev
data01/freshports/nginx01/var/db/freshports/cache/daily 33G 133M 33G 0% /jails/nginx01/var/db/freshports/cache/daily
data01/freshports/nginx01/var/db/freshports/cache/news 33G 11M 33G 0% /jails/nginx01/var/db/freshports/cache/news
data01/freshports/nginx01/var/db/freshports/cache/commits 59G 26G 33G 45% /jails/nginx01/var/db/freshports/cache/commits
data01/freshports/nginx01/var/db/freshports/cache/spooling 33G 128K 33G 0% /jails/nginx01/var/db/freshports/cache/spooling
data01/freshports/nginx01/var/db/freshports/cache/packages 33G 40M 33G 0% /jails/nginx01/var/db/freshports/cache/packages
data01/freshports/nginx01/var/db/freshports/cache/ports 39G 6.6G 33G 17% /jails/nginx01/var/db/freshports/cache/ports
data01/freshports/nginx01/var/db/freshports/cache/general 33G 8.0M 33G 0% /jails/nginx01/var/db/freshports/cache/general
data01/freshports/nginx01/var/db/freshports/cache/pages 33G 96K 33G 0% /jails/nginx01/var/db/freshports/cache/pages
data01/freshports/nginx01/var/db/freshports/cache/categories 33G 29M 33G 0% /jails/nginx01/var/db/freshports/cache/categories
 
try to import it by device id using
zpool import -d /dev/nda2p1

Then if the import is successful check if "autoexpand" is on data01

zpool get autoexpand data01
 
try to import it by device id using
zpool import -d /dev/nda2p1

Then if the import is successful check if "autoexpand" is on data01
Code:
[16:29 aws-1 dan /var/backups] % sudo zpool import -d /dev/nda2p1
   pool: data01
     id: 17238602793760673894
  state: UNAVAIL
status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
 config:

    data01      UNAVAIL  insufficient replicas
      nda2p1    UNAVAIL  cannot open
 
I've created a new volume vol-0a9652ed611542c31 from a snapshot taken at 2025/08/25 05:40 GMT-4

It is not attached to the instance.

edit: 2025-08-29 : this volume was not required and has since been deleted.
 
Last edited:
rwp on IRC led us to the cause: devices were renumbered and the host also contained two single-partiition drives, one for swap and one for the data01 zpool

In short, swap was mounted over the data01 zpool:

Code:
[17:09 aws-1 dan ~] % swapinfo -h
Device              Size     Used    Avail Capacity
/dev/nda2p1         250G       0B     250G     0%
[17:10 aws-1 dan ~] % cat /etc/fstab
/dev/nvd2p1 none swap sw         0 0

Note how swap is 250G, which is the size of the drive for data01.

Code:
[17:17 aws-1 dan ~] % gpart show
=>       40  524287920  nda0  GPT  (250G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048  524285912     2  freebsd-zfs  (250G)

=>      40  16777136  nda1  GPT  (8.0G)
        40  16777136     1  freebsd-swap  (8.0G)

=>       40  524287920  nda2  GPT  (250G)
         40  524287920     1  freebsd-zfs  (250G)

[17:22 aws-1 dan ~] %
 
As kevens pointed out, "i wonder if we could add some guardrails that would've prevented this"
 
Example:
I've created partition for freebsd-zfs on NVMe SSD attached via USB adapter.
Code:
# gpart add -t freebsd-zfs -a 1M -l <unique label> -s 3661G da0
And created pool as follows.
Code:
# zpool create -R /mnt <pool name> /dev/gpt/<unique label>

After transferring everything from my previous SSD in my notebook and fixed up anything specifying labels of previous SSD to labels of new SSD, swapped old and new SSD (now on my NVMe slot in my notebook, recongized as nda0*) and now I'm working on new larger SSD. Not bothered by geom provider changes (/dev/da0* to /dev/nda0).
 
Swap was turned off:

Code:
[18:17 aws-1 dan ~] % sudo swapoff -a   
swapoff: removing /dev/nvd2p1 as swap device

[18:17 aws-1 dan ~] % swapinfo
Device          1K-blocks     Used    Avail Capacity

[18:17 aws-1 dan ~] % sudo zpool import         
   pool: data01
     id: 17238602793760673894
  state: ONLINE
status: Some supported features are not enabled on the pool.
    (Note that they may be intentionally disabled if the
    'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identifier, though
    some features will not be available without an explicit 'zpool upgrade'.
 config:

    data01      ONLINE
      nda2p1    ONLINE

[18:18 aws-1 dan ~] % zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zroot   250G  36.2G   213G        -         -    28%    14%  1.00x    ONLINE  -

[18:18 aws-1 dan ~] % sudo zpool import data01

[18:18 aws-1 dan ~] % zpool list             
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data01   200G   161G  38.9G        -       50G    85%    80%  1.00x    ONLINE  -
zroot    250G  36.2G   213G        -         -    28%    14%  1.00x    ONLINE  -
[18:18 aws-1 dan ~] %
 
With the missing pool mounted, let's scrub:

Code:
[18:19 aws-1 dan ~] % sudo zpool scrub data01
[18:19 aws-1 dan ~] % zpool status data01   
  pool: data01
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub in progress since Mon Aug 25 18:19:39 2025
    7.17G / 161G scanned at 188M/s, 3.13G / 161G issued at 82.2M/s
    0B repaired, 1.95% done, 00:32:42 to go
config:

    NAME        STATE     READ WRITE CKSUM
    data01      ONLINE       0     0     0
      nda2p1    ONLINE       0     0     0

errors: No known data errors
[18:20 aws-1 dan ~] %
 
Code:
[18:35 aws-1 dan ~] % zpool status data01
  pool: data01
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:31:09 with 0 errors on Mon Aug 25 18:50:48 2025
config:

    NAME        STATE     READ WRITE CKSUM
    data01      ONLINE       0     0     0
      nda2p1    ONLINE       0     0     0

errors: No known data errors
[19:04 aws-1 dan ~] %
 
I started the webserver jail, the website came up.

Code:
[19:05 aws-1 dan ~] % sudo service jail start nginx01     
Starting jails: nginx01.

Then shutdown -r now. Now waiting for Nagios to clear out to green.
 
As kevens pointed out, "i wonder if we could add some guardrails that would've prevented this"

Linux does act on this, by marking block devices that are meant for swap. If the marker is missing, the kernel will not swap to that block device.

This would be trivial to do in FreeBSD. Except that existing systems would be cut off from their existing swap until somebody marks the devices or turns this mechanism off.
 
Linux does act on this, by marking block devices that are meant for swap. If the marker is missing, the kernel will not swap to that block device. This would be trivial to do in FreeBSD. Except that existing systems would be cut off from their existing swap until somebody marks the devices or turns this mechanism off.

Something else we'd kind of batted around was the idea of refusing to use it if it has valid UFS/ZFS metadata/magic, but there's a usability hiccup there in that you there's a risk for false positives and you might need a transitional step to convert a now-discarded filesystem partition into swap.
 
I also thought about a flag to `swapon` and a corresponding option in fstab to only do the actual swap if a signature is present on the device. That would at least protect against device mixups on future-forward installed systems.
 
Something else we'd kind of batted around was the idea of refusing to use it if it has valid UFS/ZFS metadata/magic, but there's a usability hiccup there in that you there's a risk for false positives and you might need a transitional step to convert a now-discarded filesystem partition into swap.
in the zfs case, the tool you're looking for is zfs labelclear $DEVICE ;)
 
it is very dangerous if used incorrectly.

  • Data loss: The most significant danger is that zpool labelclear will make all data on the disk inaccessible. While it doesn't zero out the entire disk (it only erases the ZFS labels), without the labels, ZFS has no way to recognize the disk as part of a pool and cannot access the data on it. You will lose access to all the data on the device.
  • Pool corruption: Running this command on a device that is still an active part of a running ZFS pool can lead to pool degradation or even destruction. ZFS has built-in safeguards to prevent this from happening (it will refuse to run on an active device unless you use the -f force option), but you should never attempt this on a device that is part of a pool you care about.
  • Accidental misuse: Because it's a powerful and destructive command, it's crucial to be absolutely sure you are running it on the correct device. A typo in the device path could lead to catastrophic data loss on the wrong disk.
 
Overnight, I had recurring dreams/hallucinations of how using device labels (e.g. gtp/zfs0) instead of partition names (e.g. /dev/nda2p1) would have avoided this. It kept going through my head (I was dealing with a fever) most of the night.

Today I found I have done this in the past: https://dan.langille.org/2019/10/15/going-from-partition-to-label-in-zpool-status/

The above solution deals with a mirror. The procedure might not translate well to a single drive zpool (remember this is an AWS host, so it's not a physical drive).
 
Back
Top