ZFS ZFS Dataset Size is out of control, ZFS SEND/RECV hangs on random datasets

jtocci · May 24, 2024

I did find this thread that was related but I have snapshots.
https://forums.freebsd.org/threads/zfs-dataset-is-occupying-more-space-than-the-actual-data.83901/

A few months ago i had a zfs corruption issue with the server in question and re-partioned.

Since then I've been having zfs send | zfs recv backups that hang every few days.

Well, this one dataset had a lot of variation in the zfs list -t snap REFER column. Over the hours it would climb from 50G to 155G, then go back down, then climb to 100G and go back down, etc. But right now, the latest snap shows 320G!

Code:

root(4)smtp:~ # df -h /zsmtp_jail/postfix
Filesystem            Size    Used   Avail Capacity  Mounted on
zsmtp_jail/postfix    919G    320G    599G    35%    /zsmtp_jail/postfix

root(4)smtp:~ # du -hs /zsmtp_jail/postfix/
 55G    /zsmtp_jail/postfix/

root(4)smtp:~ # zfs list
NAME                                                      USED  AVAIL  REFER  MOUNTPOINT
zsmtp_jail/postfix                                        320G   620G   320G  /zsmtp_jail/postfix

When I enter the jail and run du I get the same 55G.

Why the wild size discrepancy? 919G? 320G? 55G?

I renamed my backup dataset to preserve all my snaps and destroyed all the snaps on the live server. Now I get:

Code:

root(4)smtp:~ # zfs list zsmtp_jail/postfix
NAME                 USED  AVAIL  REFER  MOUNTPOINT
zsmtp_jail/postfix   320G   620G   320G  /zsmtp_jail/postfix

Code:

zpool status shows:
  pool: zsmtp_jail
 state: ONLINE
  scan: scrub repaired 0B in 00:27:09 with 0 errors on Fri May 24 04:22:09 2024    <--TODAY!
config:

        NAME        STATE     READ WRITE CKSUM
        zsmtp_jail  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            nda1p3  ONLINE       0     0     0
            nda2p3  ONLINE       0     0     0

1/ As far as I know, the pool is fine (says zfs) and there is nothing that I can do to fix the dataset. Any info to the contrary is welcome.
2/ As far as I know, the size discrepancy is indicative of a real problem since there is *NOT* 320G in the dataset now that all the snaps are destroyed. Perhaps the snap space takes time to be re-calculated? I would like to know how zfs handles this. Is it fixed on the next scrub? I looked and found no reference to scrub recalculating sizes. Any info welcome, especially if i can trigger the resize.
3/ in the last month I have had fourty six backups hang on zfs send/recv. About a third of those are from a server that is backing up to itself from an ssd to a hard drive so I don't think SSH has anything to do with it. The datasets vary.
for example:
back/smtp/zsmtp/usr/src
back/smtp/zsmtp/var/crash
back/aujail/jail
back/smtp/zsmtp/var
zgep_back/zgep/var/crash
back/smtp/zsmtp_jail/jmusicbot
zgep_back/zgep/ROOT
zgep_back/zgep/var
zgep_back/zgep/var/crash
zgep_back/zgep/var/crash
zgep_back/zgep/var/crash
zgep_back/zgep/usr
back/smtp/zsmtp/usr

I would love to know:
why do these datasets hang on zfs send/recv?
is there any command i can run to find the datasets that are in a state where they could or would hang?
is there a way to 'clean' them so they don't hang anymore?
should I be thinking of a new zfs pool again? should i only move the data in by rsync and not zfs send/recv?
Any thoughts appreciated.
4/ if the total files in a dataset without snapshots can be 50G but zfs reports 320G, is there some command that I can see what the extra space is for?

jtocci · May 24, 2024

zfs list
zsmtp_jail/postfix 55.5G 564G 55.5G /zsmtp_jail/postfix
zsmtp_jail/postfix_xld 320G 564G 55.2G /zsmtp_jail/postfix_xld
The new dataset is only 55G and transferring properly. I've retained my snapshots but just renaming the old dataset on the backup server. I've retained the bad dataset in case anyone has any ideas. I would like to figure out a better way out of this mess. Note that now the REFER column on the bad dataset shows only 55G now instead of 320G!

Why would it do that? Did it happen when I unmounted it?

That reminds me, I had to do a zfs unmount -f to force it to unmount. It wouldn't unmount without it. I ran various commands to see what was open:

fstat | grep postfix
procstat -fa | grep postfix
sh -c "ps ax -o pid=|xargs procstat -f 2>/dev/null" | grep postfix

At first, fail2ban was showing some files open:
root(4)smtp:~ # fstat | grep postfix
root python3.9 2025 14 /zsmtp_jail/postfix 937589 -rw-r----- 1255190
root python3.9 2025 18 /zsmtp_jail/postfix 647378 drwxr-xr-x 60
root(4)smtp:~ # sh -c "ps ax -o pid=|xargs procstat -f 2>/dev/null" | grep postfix
2025 python3.9 14 v r -----n-- 2 0 - /zsmtp_jail/postfix/var/log/maillog
2025 python3.9 18 v d -----n-- 2 0 - /zsmtp_jail/postfix/var/log

But shutting down fail2ban cleared those. Maybe I should have shut down fail2ban before shutting down the jail.

Anyway zfs unmount -f worked.

Still interested in any information about why I can have size 320G and refer to 55.2G. Meanwhile, the snaps are still refer 320G
root(4)smtp:~ # zfs list -t snap zsmtp_jail/postfix_xld
NAME USED AVAIL REFER MOUNTPOINT
zsmtp_jail/postfix_xld@2024-05-24.12 10.5M - 320G -
zsmtp_jail/postfix_xld@2024-05-24.13 13.4M - 320G -

If it is clear to anyone that there is a bug here let me know and i will try to report it.

Thank you all for any help you can offer.

jtocci · May 27, 2024

Now, just four days later, I had a panic while deleting some snaps on zsmtp_jail/nginx

To boot we had to add to /boot/loader.conf
vfs.zfs.recover=1

I'm going to give nginx the rename-rsync treatment and try to reboot without zfs recovery

jtocci · May 27, 2024

zpool still requires recover mode or it panics.

jtocci · May 27, 2024

Ok, I failed to put the -r on the zfs destroy so the bad dataset still existed.

After destroying it for real and removing the recover mode it still panics.

Next I created new datasets on a backup drive and wiped the whole zpool.

rsynced everything back and rebooted.

zfs with two mirrors isn't as reliable as I expected. I would love to know what I'm doing wrong. Second time I've had this problem with this machine.

cracauer@ · May 28, 2024

Unstable hardware?

jtocci · May 28, 2024

Thank you for your reply cracauer!

I am certainly with you, the common denominator is this machine. However, I have no indication that that is the case. It's been rock solid other than the zfs crashes.

I am also left with other questions that I have no way to answer.

Is there a way to track down the discrepancy between REFER and du?
-should I add a periodic test to detect such a discrepancy as an indication of an unhealthy zpool?
If there is a problem with zfs why is there no way to classify or detect it?
-Replacing the zpool fixed it, but since I didn't have a way to find the problem I don't know what happened or how to detect or prevent it.
My zfs backup script is pretty complex. Dealing with send hangs all the time complicated it further. Now I need to incorporate resume tokens. Is backing up zfs usually this complex?

Obviously I have multiple servers and love zfs, our whole infrastructure is based on it. A handful of servers and dozens of jails. But the lack of info in the face of a factual problem has me wondering if I fell asleep during zfs 101.

cracauer@ · May 28, 2024

I dunno about REFER and du, but I'd still start memtest86ing thingie.

Are you sure you don't have snapshots or clone there?

jtocci · May 29, 2024

we went through memtest last time this happened. ECC memory. it all passed testing so we took half out. That way we halved our chances that it was a memory issue. Now we have the same issue so I'm inclined to say not memory but who knows?

I need to set up another server anyway so I'm thinking I'll transfer production to the new one and free this server up for some testing.

No snapshots since I had deliberately deleted them all, then checked again. I rarely use clones so I didn't expect any.

I do expect there is some kind of undocumented calculation delay. I'll test that soon since I am interested to know myself if that is a thing.

I will post the results here.

Mirror176 · May 29, 2024

Some things that may help others help you:

CPU and RAM of the two machines.
OS/kernel version `uname -a` or at least`uname -KU`. FreeBSD 13.3 was released with a particularly ugly zfs memory release issue which can make things become incredibly slow; not sure if it can cause a complete freeze too. I don't know if it was fixed but its better to use 13.2 or 14.0 if it wasn't. Hanging I would presume is a bug or hardware issue instead of just a corrupted pool
ZFS version if different such as from ports.
ashift value for pool `zdb -C zsmtp_jail|grep ashift`.
Commands used for the send and receive of the pools; an interrupted receive does not list as a snapshot but you can list the token to resume it.
Any differences between the sending and receiving pools: versions, properties, differences in running hardware.
Other ZFS space measurements: `zfs list -ro space zsmtp_jail` or `zfs list -t snapshot -ro name,used -s used zsmtp_jail`.
Any other ZFS properties that are not default or could be presented for examining: `zpool get all zsmtp_jail | grep -v default` `zfs get -r all zsmtp_jail | egrep -v 'default|inherited'`.

I don't have a good explanation for the changing sizes with my best guess wondering if intermediate snapshots are playing a role. Possible other explanations for unexpected space use that I could think of is if there are copies turned on for data, lowering compression, using block cloning (doesn't transfer as cloned blocks if I recall), many files smaller than ashift, incomplete `zfs recv -s` transfer(s) remaining. Things like raidz's unexpectely higher allocated disk amounts shouldn't apply if this is just a mirrored pool.

Sizes do have calculation delay; compressed data's size is not known until after compression has been performed. Writes (and deletes) is cached for writing for at most `sysctl vfs.zfs.txg.timeout` seconds before steps to commit them to disk are taken. Destroying snapshots should take time for the command to return; I'd still give at least that additional txg.timeout seconds before even thinking of checking on space.

You shouldn't need rsync or other tools to transfer data unless you are trying to do things ZFS cannot do during a transfer like rewrite some data structures: activate block cloning on blocks that are the same (currently undone with zfs send/recv), increase/decrease record size, etc.

Though du isn't recommended for figuring out how a pool has been used, you may want to also compare results with its -A flag. `zdb -d` when provided a dataset can output things helpful to further walk through that dataset. You can follow it up with an object # ot examine the object and more 'd's for more detail. I wouldn't call it user friendly and requires understanding ZFS structures to make proper sense of its output.

Without knowing why the system freezes during the transfer, you cannot know what can/cannot be transferred without causing a freeze. If a zfs scrub comes back clean, then your data should be intact. If you still have filesystem corruption (very unlikely but not impossible) then you would need to replace impacted data from backup, possibly going as far as destroying+recreating the pool.

As common as RAM issues are, it is not the only part of systems that go unstable. Over the past few years, I ended up teaching computer technicians that CPUs are having issues commonly enough that it was no longer the last thing that got checked during troubleshooting. Bad power supply, faulty or poorly designed accessories, etc. can wreak havoc too. No one test program checks for all possible faults and some faults are only brought out in certain conditions (humidity, temperature, load on multiple components, etc.)

ZFS isn't designed to make unreliable hardware become reliable. Checksums help identify, and fix if there are copies, corrupted data. Your data gets contained in ZFS data blocks and those blocks all have checksums, but your data does not. If data is corrupted and then checksummed after, you have corrupted data put on disk that is marked as valid. Filesystem bugs could also write bad data depending where in the pipeline they occur. If you think the hardware is unstable then that needs to be addressed.

jtocci · May 30, 2024

Thank you for your reply Mirror176. I will do what I can to reply quickly.

richardtoohey2 · May 30, 2024

Mirror176 said:
FreeBSD 13.3 was released with a particularly ugly zfs memory release issue which can make things become incredibly slow

If you mean this one it was fixed recently:

https://www.freebsd.org/security/advisories/FreeBSD-EN-24:09.zfs.asc

But looks like something you'd definitely want if using 13.3.

nunziotocci · May 31, 2024

Related: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278958
We've been having kernel panics that are possibly related to this issue.

jtocci · Jun 1, 2024

Mirror176. Thank you for your detailed and informative response. I did not post specs because I wanted info about zfs regardless of hardware since two machines were affected but since you were so thorough here we go...

smtp is a 2020 ASRock X570 Taichi with a Seasonic Focus powersupply. powersupply is not original, there was a recall on the original.
CPU is AMD Ryzen 9 5950X Vermeer (Zen 3) 16-Core 3.4 GHz Socket AM4
RAM was replaced with ECC a year ago or so, i don't have the part number on hand
FreeBSD smtp.wfprod.com 14.0-RELEASE-p6 FreeBSD 14.0-RELEASE-p6 #0: Tue Mar 26 20:26:20 UTC 2024 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Intel 58GB Optane 800P M.2 2280 3D Xpoint PCIe SSD SSDPEK1A058GA for swap
Corsair MP600 PRO NH for zpools
zfs-2.2.0-FreeBSD_g95785196f
zfs-kmod-2.2.0-FreeBSD_g95785196f

gep is a 2024 ASRock B650M-HDV/M.2 Socket AM5 Micro ATX with a CORSAIR SF850L SFX Power Supply
CPU is AMD Ryzen 5 7600 6-Core 3.8 GHz Socket AM5 65W
RAM is 16GB Kingston 4800MHz CL40 DDR5
Two SSDs are Corsair MP600 PRO NH M.2 2280 2TB PCIe 4.0 x4 3D TLC CSSD-F2000GBMP600PNH
FreeBSD geproducts.net 14.0-RELEASE-p6 FreeBSD 14.0-RELEASE-p6 #0: Tue Mar 26 20:26:20 UTC 2024 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
zfs-2.2.0-FreeBSD_g95785196f
zfs-kmod-2.2.0-FreeBSD_g95785196f

I installed from the same iso image on a USB stick and built both machines by hand using pkg only. No source at all.
ashift on both production pools is 12. One of the backups is 9, the other is 12.

all the zfs commands that had a problem were of the form:

Code:

${ssh) sudo zfs send -c ${rds}@${date} | zfs recv -Fu ${lds}

but the script did have a case where it did an incremental of the form:

Code:

${ssh} sudo zfs send -cI ${rds}@${hdate} ${rds}@${date} | zfs recv -Fu "${lds}"

However, I have rewritten the script using resumable sends of the form:

Code:

${ssh} timeout ${time} sudo zfs send -c -t ${token} | zfs recv -Fsu back/${bpfx}/${prds}
${ssh} timeout ${time} sudo zfs send -c ${prds}@${date} | zfs recv -Fsu back/${bpfx}/${prds}
${ssh} timeout ${time} sudo zfs send -cI "${prds}@${lbs_date}" "${prds}@${date}" | zfs recv -Fsu "back/${bpfx}/${prds}"

This new version of the script crashed the smtp server once so far but overall it is better written so I'm keeping it. However, when it crashed the server the backup script did not terminate. Periodic reported that the previous hourly was still running. So I have decided to put a timeout on the zfs recv as well.

zdb diff for zsmtp_jail to back (different machines):
< name: 'back'
---
> name: 'zsmtp_jail'
...
< vdev_children: 1
---
> vdev_children: 2
< type: 'disk'
---
> type: 'mirror'
< path: '/dev/ada0p3'
< whole_disk: 1
---
> whole_disk: 0
> children[0]:
> id: 0
> path: '/dev/nda1p3'
> whole_disk: 1
> DTL: 13483
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> path: '/dev/nda2p3'
> whole_disk: 1
> DTL: 12598
> create_txg: 4
> children[1]:
> type: 'indirect'
> whole_disk: 0
> metaslab_array: 0
> metaslab_shift: 34
> ashift: 12
> is_log: 0
> non_allocating: 1
> create_txg: 18
I removed guids and such for brevity

zdb diff for gep (both pools on same machine)
< name: 'zgep'
---
> name: 'zgep_back'
< txg: 265556
---
> txg: 229907
---
< type: 'mirror'
---
> type: 'disk'
< whole_disk: 0
---
> path: '/dev/ada0p3'
> whole_disk: 1
< metaslab_shift: 29
< ashift: 9
---
> metaslab_shift: 30
> ashift: 12
< children[0]:
< type: 'disk'
< id: 0
< path: '/dev/nda0p2'
< whole_disk: 1
< DTL: 281
< create_txg: 4
< children[1]:
< type: 'disk'
< id: 1
< path: '/dev/nda1p2'
< whole_disk: 1
< create_txg: 4
zfs get for zsmtp_jail and back (different machines)
< zsmtp_jail size 1.77T -
< zsmtp_jail capacity 41% -
---
> back size 10.8T -
> back capacity 37% -
---
< zsmtp_jail free 1.03T -
< zsmtp_jail allocated 756G -
---
> back free 6.72T -
> back allocated 4.10T -
< zsmtp_jail fragmentation 0% -
---
> back fragmentation 3% -
< zsmtp_jail load_guid 1809226621819765726 -
---
> back load_guid 9429038092753376069 -
< zsmtp_jail feature@device_removal active local
< zsmtp_jail feature@obsolete_counts active local
---
> back feature@device_removal enabled local
> back feature@obsolete_counts enabled local
< zsmtp_jail feature@zilsaxattr active local
---
> back feature@zilsaxattr enabled local

zfs get for zgep and zgep_back (same machine)
< zgep creation Tue Apr 30 13:57 2024 -
< zgep used 1.62G -
< zgep available 65.7G -
< zgep referenced 26K -
< zgep compressratio 2.10x -
---
> zgep_back creation Thu May 2 12:59 2024 -
> zgep_back used 1.72G -
> zgep_back available 143G -
> zgep_back referenced 96K -
> zgep_back compressratio 2.03x -
< zgep mountpoint /zgep default
---
> zgep_back mountpoint /jback local
< zgep guid 1722107665142296273 -
---
> zgep_back guid 7431549006592437355 -
< zgep usedbydataset 26K -
< zgep usedbychildren 1.62G -
---
> zgep_back usedbydataset 96K -
> zgep_back usedbychildren 1.72G -
< zgep written 0 -
< zgep logicalused 3.23G -
< zgep logicalreferenced 13K -
---
> zgep_back written 96K -
> zgep_back logicalused 3.24G -
> zgep logicalreferenced 42.5K -
< zgep snapshots_changed Fri May 31 3:59:00 2024 -

If by 'intermediate snapshots' you mean what I understand as 'incremental' then yes, it does get done in certain situations but my observation of the size problem was after all snaps were deleted. I was not using resumable streams until this very last crash on Saturday so that should not have been an issue. I do very much appreciate your list of possible size sources. Thank you.

sysctl vfs.zfs.txg.timeout is set to five seconds on the smtp server with the observed size issues. I waited several minutes and checked several times so I don't think that was an issue.

I did rsync because I considered that a corrupted pool might 'zfs send' corruption. I stand corrected.

zdb -dd was exciting to learn about, thank you.

We have no idea why we are getting panics on smtp but we do have core dumps. see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278958

"If you still have filesystem corruption (very unlikely but not impossible) then you would need to replace impacted data from backup, possibly going as far as destroying+recreating the pool."

I thought that this situation was supposed to be theoretical, hence my consternation. As far as i understand, the panic situations are during zfs operations on smtp. But the hangs are on two completely different backup servers (one local, one remote).
However, gep has been up for two weeks now(no panics) and is backing up locally (no ssh) and until I changed the script to use timeout/resume it was hanging every couple days. I agree that the trouble with smtp may be instability, but gep is experiencing the same hangs in zfs send/recv, does not crash and shares no components with smtp.

I agree, this problem involves multiple variables and hardware is a possibility that is welded to the table.

"ZFS isn't designed to make unreliable hardware become reliable."

I agree. I also appreciated your clarification of checksums and the possibility of filesystem bugs.

Facts on the table:
Still confused by zfs send/recv hangs but timeout has mitigated it for now. No action needed.
I need a plan to detect hangs despite mitigation or zpool corruption may go unaddressed.
smtp has helped me improve our backup setup to be more resilient but at great cost. I will replace the hardware.
I have discarded the idea of using du against REFER since it wouldn't work with snapshots.
zpool corruption of this nature on our backup servers will not be noticed on backup zpools since they don't get sent. If the issue is zfs and not hardware then it would be useful to try to figure out another way to detect datasets that may hang. If the problem spreads I will consider learning about zbd and zfs data structures.

Thank you very much Mirror176. You are a gentleman and a scholar.

jtocci · Jun 6, 2024

new development. smtp rebooted last night and since then I am getting stuck processes

Code:

root            6    0.0  0.0      0  10608  -  DL   Wed04       7:18.01 [zfskern]
root         6688    0.0  0.0  20132   9384  -  Ss   08:32       0:00.00 sudo zfs snap zsmtp/ROOT/14.0-RELEASE-p5_2024-04-15_042933@2024-06-06.04
root         6689    0.0  0.0  19508   8668  -  D    08:32       0:00.00 zfs snap zsmtp/ROOT/14.0-RELEASE-p5_2024-04-15_042933@2024-06-06.04
root        12829    0.0  0.0  19508   8808  -  I    04:27       0:00.00 zfs destroy zsmtp/var/audit@2024-06-03.16
root        14550    0.0  0.0  20132   9384  -  Is   04:32       0:00.00 sudo zfs snap zsmtp/ROOT@2024-06-06.04
root        14551    0.0  0.0  19508   8660  -  D    04:32       0:00.00 zfs snap zsmtp/ROOT@2024-06-06.04
root        19212    0.0  0.0  20132   9380  -  Is   09:04       0:00.00 sudo zfs snap zsmtp/ROOT/14.0-RELEASE-p5_2024-04-15_042933@2024-06-06.04
root        19213    0.0  0.0  19508   8676  -  D    09:04       0:00.00 zfs snap zsmtp/ROOT/14.0-RELEASE-p5_2024-04-15_042933@2024-06-06.04
root        19353    0.0  0.0  20132   9388  -  Is   09:04       0:00.01 sudo zfs snap zsmtp/ROOT/14.0-RELEASE-p6_2024-05-07_124518@2024-06-06.04
root        19354    0.0  0.0  19508   8668  -  D    09:04       0:00.00 zfs snap zsmtp/ROOT/14.0-RELEASE-p6_2024-05-07_124518@2024-06-06.04
root        19710    0.0  0.0  20132   9396  -  Is   04:44       0:00.01 sudo zfs snap zsmtp/ROOT@2024-06-06.04
root        19711    0.0  0.0  19508   8672  -  D    04:44       0:00.00 zfs snap zsmtp/ROOT@2024-06-06.04
root        19774    0.0  0.0  20132   9372  -  Is   09:05       0:00.01 sudo zfs snap zsmtp/ROOT/default@2024-06-06.04
root        19775    0.0  0.0  19508   8652  -  D    09:05       0:00.00 zfs snap zsmtp/ROOT/default@2024-06-06.04
root        21352    0.0  0.0  12796   2436  1  S+   09:09       0:00.00 grep zfs

diizzy · Jun 6, 2024

Do you have newest firmware installed on both mobo and SSDs? Is any CPU microcode patching applied (recommended)?
If powerd is running, have you tried to disbled it?
I've seen some reports of some SSDs not playing nice with ZFS I haven't seen any reports about your models. (Example, https://github.com/openzfs/zfs/discussions/14793 )
Have you monitored temperature on your SSDs during load?

jtocci · Jun 6, 2024

zpool status

Code:

  pool: zsmtp
 state: ONLINE
  scan: scrub repaired 0B in 00:01:07 with 0 errors on Thu Jun  6 04:27:42 2024
config:

        NAME        STATE     READ WRITE CKSUM
        zsmtp       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            nda2p2  ONLINE       0     0     0
            nda1p2  ONLINE       0     0     0

errors: No known data errors

  pool: zsmtp_back
 state: ONLINE
  scan: scrub repaired 0B in 02:26:19 with 0 errors on Thu Jun  6 06:52:56 2024
config:

        NAME        STATE     READ WRITE CKSUM
        zsmtp_back  ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors

  pool: zsmtp_jail
 state: ONLINE
  scan: scrub repaired 0B in 00:16:24 with 0 errors on Thu Jun  6 04:43:02 2024
remove: Removal of vdev 1 copied 228G in 0h11m, completed on Tue May 28 12:07:49 2024
        600K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        zsmtp_jail    ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            nda1p3    ONLINE       0     0     0
            nda2p3    ONLINE       0     0     0

errors: No known data errors

jtocci · Jun 6, 2024

diizzy, I apologize, I will answer your questions shortly.

I shut down all the jails and most other services but the stuck zfs processes wouldn't quit. I did a shutdown and it got stuck so I hit the reset button. When it came back up it had this to say (didn't show in dmesg, had to do a scroll lock on the box)

Setting hostuuid: -long guid-
Setting hostid: 0xaaa5b29c
Starting file system checks:
/dev/nda1p1: 4 files, 255 Mib free (522964 clusters)
FIXED
/dev/nda1p1: MARKING FILE SYSTEM CLEAN
Mounting local filesystems:.
Autoloading module: acpi_wmi
Autoloading module: if_iwlwifi
Intel(R) Wireless WiFi based driver for FreeBSD <- this line is bold and does appear in dmesg
Autoloading module: intpm
...

nda1p1 is the EFI partition in the mirror.

Weird right?
I did a grep in var/log, couldn't find these lines. I only saw it happen because I was watching the reboot process expecting to have a bad zpool. If anyone has a way to make these lines go to a log let me know.

jtocci · Jun 6, 2024

diizzy:
Do you have newest firmware installed on both mobo and SSDs? I don't have the latest firmware on the motherboard, but I did have the latest non-beta last I checked. Checking now, i'm pretty sure I can do an update. Never checked the SSDs before, I will have to schedule that this weekend. Thank you very much for the suggestion.

Is any CPU microcode patching applied (recommended)? I do the boot update and the rc.conf update

If powerd is running, have you tried to disbled it? It is running. I had not tried disabling it. I will give that a try.

I've seen some reports of some SSDs not playing nice with ZFS I haven't seen any reports about your models. (Example, https://github.com/openzfs/zfs/discussions/14793 ) I did just buy another drive so we can have the exact same model on both sides of the raid. Will do that very soon.

Have you monitored temperature on your SSDs during load? I do monitor temp with smartd. 57C was a recent high on nvme2. I never figured out what happened, but it hasn't happened again. Usually 40C is the max.

Thank you very much Diizzy for taking the time to help. I will get right on your suggestions.