Hello,
I'm experiencing weird issues with ZFS.
I have a home server, running Freebsd-RELEASE-14.2, 1 disk (nvme).
Today, I did a scrub of my pool (zpool) and got a lot of checksum errors.
There was no power failure, or events that can explain that.
This is the scrub report :
I removed the /usr/local/bastille/jails/srv/root/home/pix/aaa dir and /usr/local/bastille/jails/srv/root/usr/local/www/share/hxsafe/EDINBURGH.zip file, and redid a scrub,
and now, if i do a scrub, i have that :
(the number of checksum errors has increased because i did several scrubs)
Then i did a selftest with nvme control
The 34 unsafe shutdowns correspond to the old life of that NVME disk, in an other computer, which had experienced a lot of power failures.
So, I dont understand what happened.
Is my zpool really healthy ?
Thank you
I'm experiencing weird issues with ZFS.
I have a home server, running Freebsd-RELEASE-14.2, 1 disk (nvme).
Today, I did a scrub of my pool (zpool) and got a lot of checksum errors.
There was no power failure, or events that can explain that.
This is the scrub report :
Code:
root@nuc:~ # zpool scrub zroot
root@nuc:~ # zpool status zroot -v
pool: zroot
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:07 with 18 errors on Thu Apr 24 12:42:10 2025
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
nda0p4 ONLINE 0 0 37
errors: Permanent errors have been detected in the following files:
/usr/local/bastille/jails/srv/root/var/cache/pkg/perl5-5.36.3_3~d97227b0c9.pkg
/usr/local/bastille/jails/srv/root/var/cache/pkg/mariadb114-client-11.4.5_1~9ca28e7b33.pkg
/usr/local/bastille/jails/srv/root/var/cache/pkg/dotnet-9.0.3~894952f590.pkg
/usr/local/bastille/jails/srv/root/usr/home/pix/aaa/Guadeloupe/Guadeloupe-1720-NEF_DxO_DeepPRIMEXD.jpg
/usr/local/bastille/jails/srv/root/usr/local/share/dotnet/library-packs/runtime.freebsd.14-x64.Microsoft.DotNet.ILCompiler.9.0.3.nupkg
/usr/local/bastille/jails/srv/root/usr/local/share/dotnet/library-packs/runtime.freebsd.14-x64.Microsoft.NETCore.DotNetAppHost.9.0.3.nupkg
/usr/local/bastille/jails/srv/root/usr/src/sys/contrib/openzfs/tests/zfs-tests/tests/functional/cli_root/zpool_create/draidcfg.gz
/usr/local/bastille/jails/srv/root/var/cache/pkg/mariadb114-server-11.4.5_1~2db63a4a70.pkg
/usr/local/bastille/jails/srv/root/usr/src/contrib/libarchive/libarchive/test/test_read_format_zip_winzip_aes256_large.zip.uu
/usr/local/bastille/jails/srv/root/var/cache/pkg/webmin-2.013~9ed48768d8.pkg
/usr/local/bastille/jails/srv/root/var/db/freebsd-update/files/fab7de1b74ef80c3ff0760fdf182c077ded2aab249e0ea3a252f6d3cd5a15e6d.gz
/usr/local/bastille/jails/srv/root/usr/local/www/share/hxsafe/EDINBURGH.zip
zroot/ROOT/default@2025-04-14-12:18:03-0:/var/cache/pkg/python311-3.11.11~093fbc3d04.pkg
zroot/ROOT/default@2025-04-14-12:18:03-0:/usr/freebsd-dist/kernel.txz
zroot/ROOT/default@2025-04-14-12:18:03-0:/usr/freebsd-dist/src.txz
zroot/ROOT/default@2025-04-14-12:18:03-0:/var/db/freebsd-update/files/5b7b75b9ec886a56ceccab6c3b495186a643d111746d6b0e462548fa671c48b7.gz
/usr/local/bastille/jails/matrix/root/var/db/freebsd-update/files/578c8b1ce80566a6412fe5c3bd5982c87e5cfbdc6952f66ea75b53e49e5c8e59.gz
zroot/ROOT/default@2025-03-29-18:16:24-0:/usr/freebsd-dist/kernel.txz
zroot/ROOT/default@2025-03-29-18:16:24-0:/usr/freebsd-dist/src.txz
zroot/ROOT/default@2025-03-29-18:16:24-0:/var/db/freebsd-update/files/5b7b75b9ec886a56ceccab6c3b495186a643d111746d6b0e462548fa671c48b7.gz
zroot/ROOT/14.2-RELEASE-p2_2025-04-14_121803:/var/cache/pkg/python311-3.11.11~093fbc3d04.pkg
zroot/ROOT/14.2-RELEASE-p2_2025-04-14_121803:/usr/freebsd-dist/kernel.txz
zroot/ROOT/14.2-RELEASE-p2_2025-04-14_121803:/usr/freebsd-dist/src.txz
zroot/ROOT/14.2-RELEASE-p2_2025-04-14_121803:/var/db/freebsd-update/files/5b7b75b9ec886a56ceccab6c3b495186a643d111746d6b0e462548fa671c48b7.gz
//var/cache/pkg/python311-3.11.11~093fbc3d04.pkg
//usr/freebsd-dist/kernel.txz
//usr/freebsd-dist/src.txz
//var/db/freebsd-update/files/5b7b75b9ec886a56ceccab6c3b495186a643d111746d6b0e462548fa671c48b7.gz
zroot/ROOT/14.2-RELEASE_2025-03-29_181624:/usr/freebsd-dist/kernel.txz
zroot/ROOT/14.2-RELEASE_2025-03-29_181624:/usr/freebsd-dist/src.txz
zroot/ROOT/14.2-RELEASE_2025-03-29_181624:/var/db/freebsd-update/files/5b7b75b9ec886a56ceccab6c3b495186a643d111746d6b0e462548fa671c48b7.gz
/usr/local/bastille/cache/14.2-RELEASE/base.txz
I removed the /usr/local/bastille/jails/srv/root/home/pix/aaa dir and /usr/local/bastille/jails/srv/root/usr/local/www/share/hxsafe/EDINBURGH.zip file, and redid a scrub,
and now, if i do a scrub, i have that :
(the number of checksum errors has increased because i did several scrubs)
Code:
root@nuc:~ # zpool scrub zroot
root@nuc:~ # zpool status zroot -v
pool: zroot
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:00:07 with 0 errors on Thu Apr 24 13:44:48 2025
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
nda0p4 ONLINE 0 0 199
errors: No known data errors
Then i did a selftest with nvme control
Code:
root@nuc:~ # nvmecontrol selftest -c 2 nvme0
root@nuc:~ # nvmecontrol logpage -p 2 nvme0
SMART/Health Information Log
============================
Critical Warning State: 0x00
Available spare: 0
Temperature: 0
Device reliability: 0
Read only: 0
Volatile memory backup: 0
Temperature: 310 K, 36.85 C, 98.33 F
Available spare: 100
Available spare threshold: 10
Percentage used: 0
Data units (512,000 byte) read: 3754076
Data units written: 3261524
Host read commands: 18228121
Host write commands: 84273103
Controller busy time (minutes): 717
Power cycles: 92
Power on hours: 153
Unsafe shutdowns: 34
Media errors: 0
No. error info log entries: 1
Warning Temp Composite Time: 0
Error Temp Composite Time: 0
Temperature 1 Transition Count: 0
Temperature 2 Transition Count: 0
Total Time For Temperature 1: 0
Total Time For Temperature 2: 0
root@nuc:~ # nvmecontrol logpage -p 1 nvme0
Error Information Log
=====================
No error entries found
The 34 unsafe shutdowns correspond to the old life of that NVME disk, in an other computer, which had experienced a lot of power failures.
So, I dont understand what happened.
Is my zpool really healthy ?
Thank you
