Help needed! Sleeping drives are being woken up after upgrading to FreeBSD 15.0

I’m spinning down part of my drives because they are only used periodically and I want to save power (and reduce noise).

This worked perfectly for a year or so but after upgrading from Freebsd-14.3-p7 to FreeBSD 15.0-RELEASE something is pulling them out of sleep with an interval of (what looks to be exactly) 10 minutes. Funny thing is that it doesn’t start directly after a re-start. In the beginning the drives are sleeping uninterrupted for somewhere between 30m to 1h15m (for the re-starts that I’ve checked).

I’ve disabled smartd (not starting it) and this did not change anything so I assume it is not responsible.
The drives that I put to sleep are used for two ZFS pools and one NTFS file system.
The NTFS drive is not affected by this issue. Does not matter if it’s mounted or unmounted.
Only one of the pools is in part made available as a samba share. The other is not, leading me to believe that samba has nothing to do with the problem.
Unmounting the pool zfs unmount pool1 does not change anything. Drives are still periodically waking up.
An exported pool zpool export pool1 does not wake up any more.
And yes, reverting back to 14.3 removes the issue.

Any idea on what it could be? Or how I could find out what’s responsible?
Has something changed with ZFS that could be responsible for this behavior? I have not upgraded the pools.
 
I'm having have a similar issue here:

I can put the drives in sleep or standby while mounted, and then it's quiet for a while, but after 10 minutes they wake up again to active/idle
 
WARNING 1: Seeing that you two have received no additional responses in more than 72 hours, with nothing but good intentions in my heart, especially helping you, I outsourced your case to a non-subject expert-in-training and below the horizontal line is what it had to say.

WARNING 2: Use it all only as an inspiration or starting investigative point and check everything using official sources. That it sounds certain doesn't mean it is; these non-subjects are trained to first and foremost sound certain.

WARNING 3: Read in full before doing anything. Then, decide what you want to do.

Case 1 is Dre's. Case 2 is thorstenr's.

Note: If you end up solving your problem, please write a post explaining what fixed it, so it gets documented for the community.



Diagnose periodic ZFS disk wakeups after upgrading to FreeBSD 15.0

What changed (high-signal suspects)
FreeBSD 15.0-RELEASE updates the in-base OpenZFS implementation to zfs-2.4.0-rc4 (see FreeBSD 15.0-RELEASE Release Notes).

OpenZFS 2.4 introduces a “TXG time database” mechanism that records and flushes transaction-group (TXG) timestamps. In zfs(4)(), the defaults are:
- zfs_spa_note_txg_time=600 seconds (10 minutes)
- zfs_spa_flush_txg_time=600 seconds (10 minutes)

An upstream OpenZFS 2.4 defect report describes the same symptom pattern:
- disks spin up exactly every 10 minutes while a pool is imported (even with datasets unmounted)
- exporting the pool stops it
- importing read-only stops it
- increasing spa_flush_txg_time to a very large value allows disks to stay spun down
(see OpenZFS issue #18082).

Terms (quick definitions)
- Pool: ZFS storage pool (zpool(8)()), built from one or more disks/vdevs.
- Dataset: ZFS filesystem/volume inside a pool (zfs(8)()).
- Imported vs exported: Imported means the pool is active in the kernel; exported means detached (no background pool activity). Observation that zpool export pool1 stops wakeups is a key discriminator.
- TXG (transaction group): internal ZFS batching unit; periodic TXG-related bookkeeping can trigger small writes even with no open files.

0) Prerequisites and baseline checks
Run as root:

Code:
# freebsd-version -ku
# uname -a
# zfs version
# zpool status

1) Confirm the wakeup cadence (10 minutes vs ~5 seconds)
1.1 Watch live disk I/O
Use a simple live view:

Code:
# gstat -I 1

What success looks like: when the pool is idle, the target HDDs show no periodic I/O bursts.

2) Fast isolation: prove “imported pool == wakeups”

Hazard (availability / data path): zpool export immediately detaches the pool and makes all datasets in it unavailable to the host (and any jails/services using them). Scope: the exported pool. Safer pattern: stop services using the pool, confirm nothing is mounted/used (e.g., zfs mount, fstat, zpool status), then export.
This matches the reported findings, but keep it as a repeatable test:

Code:
# zpool export pool1

Hazard (device power/state control): camcontrol standby changes a disk’s power state and can disrupt in-flight I/O, cause timeouts, or trigger error recovery if anything is still touching the device. Scope: the specified disk device(s). Safer pattern: export the pool first (or otherwise ensure the device is idle), then issue standby to the correct device node; avoid testing on the boot/root disk.
Then place the HDD in standby:

Code:
# camcontrol standby /dev/adaX

Wait longer than the HDD idle timeout.

Interpretation:
- If exported pools stay asleep, the trigger is “pool imported” behavior (kernel/ZFS or pool properties), not Samba mounts or userland file opens.

3) Eliminate ZFS properties that intentionally generate background I/O
3.1 multihost (MMP) — periodic writes by design
When multihost=on, ZFS performs periodic writes to show the pool is in use (see zpoolprops(7)()).

Code:
# zpool get -H multihost pool1
# zpool get -H multihost data

Hazard (pool behavior / compatibility): zpool set multihost changes pool coordination behavior. Scope: the target pool; impacts how ZFS guards against multi-host imports. Safer pattern: record current value first ( zpool get multihost) so rollback is trivial.
If it is on on a single-host system, turn it off and re-test:

Code:
# zpool set multihost=off pool1
# zpool set multihost=off data

3.2 autotrim
autotrim=on causes periodic TRIM of recently freed space; default is off (see zpoolprops(7)()).

Code:
# zpool get -H autotrim pool1
# zpool get -H autotrim data

Hazard (performance / device behavior): zpool set autotrim can change background I/O patterns and performance characteristics. Scope: the target pool; effect depends on device type. Safer pattern: apply only to the HDD pools under test and keep a note of the prior setting for rollback.
If enabled on HDD pools, disable and re-test:

Code:
# zpool set autotrim=off pool1
# zpool set autotrim=off data

Rollback: restore prior values using zpool set.

4) Check for the 10-minute TXG time database flush (strong match for Case 1)
zfs(4)() documents a 600-second default flush interval for the TXG time database. This aligns with “exactly 10 minutes” wakeups and with the upstream OpenZFS 2.4 report (OpenZFS issue #18082).

4.1 Find the exact sysctl node names on this system
Do not guess names; discover them:

Code:
# sysctl -a | egrep 'spa_(note|flush)_txg_time'

Hazard (filesystem durability characteristics): changing ZFS sysctls can alter write patterns and what metadata is preserved across unexpected power loss. Scope: host-wide ZFS behavior (kernel module). Safer pattern: treat as a short diagnostic only; record the prior value and restore it after the test.
4.2 Temporarily change the flush interval to see if the wakeup interval changes
Example pattern (use the exact node name returned in 4.1):

Code:
# sysctl <node_for_spa_flush_txg_time>=3600

What success looks like: wakeups shift from ~600 seconds to ~3600 seconds. That confirms the TXG time DB flush as the trigger.

Rollback: set it back to 600.

5) Rule out base cron/periodic writes (often confused with “about 10 minutes”)
FreeBSD’s default cron invokes /usr/libexec/save-entropy every 11 minutes (not 10) (see save-entropy(8)() and the Handbook’s cron discussion). It stores entropy under /var/db/entropy by default, and can be disabled by setting entropy_dir="NO" in rc.conf(5)().

5.1 Verify where /var lives
If /var is on the HDD-backed pool, cron can wake those disks.

Code:
# df -h /var

5.2 Watch cron when a wakeup happens

Code:
# tail -f /var/log/cron

Hazard (system security posture): disabling entropy caching changes how entropy is preserved across reboots; it does not disable the kernel RNG. Scope: host-wide configuration. Safer pattern: only apply if logs show save-entropy correlates with wakeups; revert afterward if not needed.
5.3 Disable entropy caching (only if it is implicated)

Code:
# sysrc entropy_dir="NO"

Rollback: remove the setting or restore the previous entropy_dir value (see save-entropy(8)()).

6) Attribute the I/O to the responsible process (authoritative)
lsof/fstat can miss kernel-originated writes. Use DTrace’s io provider to see who is issuing block I/O (see dtrace io provider documentation).

Run this and wait for the next audible tick/spin-up:

Code:
# dtrace -q -n '
#pragma D option quiet
io:::start
/args[1]->device_name == "ada0" || args[1]->device_name == "ada1"/
{
  @[args[1]->device_name, execname, pid] = count();
}
'

Interpretation:
- If execname is cron, periodic, smbd, etc., fix that service/job.
- If the activity attributes to kernel/ZFS paths (no meaningful userland culprit), the symptom set matches the OpenZFS 2.4 TXG time DB issue report (export/readonly import stop it; 10-minute wakeups; txg sync thread involvement).

7) Use ZFS internal history to correlate events (optional but useful)
zpool history -i includes internally logged ZFS events (see zpool-history(8)()).

Code:
# zpool history -il pool1 | tail -200
# zpool history -il data  | tail -200

8) Practical mitigations (match both Case 1 and Case 2)
8.1 Keep “cold” pools exported when idle

Hazard (availability / data path): zpool export detaches the pool and breaks access for mounts, jails, and services. Scope: the exported pool. Safer pattern: stop dependents first; verify no mounts and no active users; then export.
This avoids the “imported pool triggers periodic access” class of problems.

Code:
# zpool export pool1

8.2 Import read-only when only reads are needed

Hazard (availability / workflow change): importing read-only prevents writes (including normal metadata updates) and can cause confusing “read-only filesystem/pool” failures in services expecting writes. Scope: the imported pool. Safer pattern: use only for truly read-only access windows; export afterward when idle.
zpoolprops(7)() documents the pool import property readonly=on. This also matches the upstream OpenZFS 2.4 report: read-only import allowed disks to spin down.

Code:
# zpool import -o readonly=on pool1

8.3 For “always-imported” HDD pools
If DTrace confirms the TXG time database flush behavior, the root cause is likely the OpenZFS 2.4 regression described upstream (OpenZFS issue #18082). In that situation, exporting idle pools or importing read-only are the lowest-risk workarounds until the upstream issue is fixed and pulled into FreeBSD.

Case mapping (based on the observations)
- Case 1 (exact ~10-minute spin-ups; unmount doesn’t help; export fixes it): strong match for zfs_spa_flush_txg_time=600 TXG time database flush behavior (see zfs(4)()).
- Case 2 (audible accesses every ~5–8 seconds; stop after standby): consistent with frequent TXG sync-thread related activity described in the same upstream report (mentions txg_sync_thread cadence).
 
I am surprised that mounted ZFS pools ever let the drives sleep.

Why shouldn't they? Works just fine - all of these are mounted:

Code:
# for i in `seq 0 9`; do
> camcontrol powermode ada$i
> done
pass6: Standby mode
pass7: Active or Idle mode
pass8: Standby mode
pass9: Active or Idle mode
pass17: Standby mode
pass18: Standby mode
pass19: Active or Idle mode
pass20: Active or Idle mode
pass21: Standby mode
pass22: Standby mode
# for i in `seq 0 11`; do
camcontrol tur da$i
done
Unit is ready
Unit is ready
Unit is not ready
Unit is not ready
Unit is not ready
Unit is not ready
Unit is ready
Unit is ready
Unit is ready
Unit is not ready
Unit is not ready
Unit is not ready

But that's the problem: people do not imagine this is a viable usecase, developers don't imagine, and then problems appear due to pointless periodic disk access and other things.
 
Back
Top