[Update on May 20th, 2025]
The solution turns out to be Intel CPU microcode update. Happy that the problem was finally resolved. Intel Alder Lake N100 has some issues (https://lists.freebsd.org/archives/freebsd-current/2025-January/006984.html, https://forum.opnsense.org/index.php?topic=36139.0). For some reason it shows up on my computer as an unreliable ZVOL backend for Debian virtual machines. Once the microcode is updated, the problem no longer appears. I cannot be 100% sure yet, but the same test did not yield any problem after running several days while it would have within one hour before microcode update.
The microcode version updated from 0x000000000000000e to 0x000000000000001c.
Still, I think there is something for FreeBSD developers to look at - buggy Intel code could trigger something unexpected in FreeBSD.
[End of update]
I recently updated a small server from a Qotom i5 mini-PC (16GB RAM) to a Topton n100 mini-PC (32GB RAM). I have several light duty Debian virtual machines running on the mini-PC. It has been rock solid for the past several years. But since the update, I am puzzled by a weird Debian Bhyve VM file system corruption problem: Basically, FreeBSD zpool reports no issue whatsoever, but the VMs kept reporting file system corruptions (inode problems and checksum mismatch, etc.).
Here is the start script for one of the VMs, in this case, a Pihole. The VM uses two ZFS block datasets: one for the root file system (5GB), and other one for the swap partition (2GB).
Sometimes a VM cannot start, stuck at the Debian initramfs interface (see the image below). Debian complains about the file system and asks for a fsck. Then the VM may boot up normally after fsck (it will fix many inode problems), but it may also end up in a kernel panic and could not be recovered (had to be rebuilt). Even though a VM boots up, often times there are many problems with the root file system. In some cases, the root file system was remounted read-only.
On the FreeBSD host, zpool scrub shows that the zpool and zfs datasets are perfect while all these are happening.
The only substantial difference between the old Qotom i5 and the new Topton n100 machine is that the FreeBSD operating system runs from a USB enclosure (an mSata SSD inside) on the former, and it runs from a SATA enclosure (a M.2 B-Key SSD inside) on the latter. The FreeBSD version is the same, V14, patched to the latest. ZFS version is zfs-2.2.0-FreeBSD_g95785196f, and zfs-kmod-2.2.0-FreeBSD_g95785196f.
This is quite a headache. It is like a timed bomb. ZFS is supposed to be exceptionally reliable, and it has been for the past several years. I suspect the problem is faulty hardware, as the mini-PC boots up just fine. I have tried recreating the Debian virtual machines, changing the parameter of virtio-blk to nvme or achi-hd. Using snapshot rollback sometimes restores a working VM but not always. It is like the VM has its own mindset to decide when to go crazy.
Any ideas? Thanks much!
Debian VM initramfs screen. fsck fixes many issues.
Information provided by dmesg on a booted-up Debian VM.
The solution turns out to be Intel CPU microcode update. Happy that the problem was finally resolved. Intel Alder Lake N100 has some issues (https://lists.freebsd.org/archives/freebsd-current/2025-January/006984.html, https://forum.opnsense.org/index.php?topic=36139.0). For some reason it shows up on my computer as an unreliable ZVOL backend for Debian virtual machines. Once the microcode is updated, the problem no longer appears. I cannot be 100% sure yet, but the same test did not yield any problem after running several days while it would have within one hour before microcode update.
Code:
### Install two packages.
pkg install x86info cpu-microcode-intel
### Add these lines to /boot/loader.conf.local.
cpuctl_load="YES"
cpu_microcode_load="YES"
cpu_microcode_name="/boot/firmware/intel-ucode.bin"
The microcode version updated from 0x000000000000000e to 0x000000000000001c.
Still, I think there is something for FreeBSD developers to look at - buggy Intel code could trigger something unexpected in FreeBSD.
[End of update]
I recently updated a small server from a Qotom i5 mini-PC (16GB RAM) to a Topton n100 mini-PC (32GB RAM). I have several light duty Debian virtual machines running on the mini-PC. It has been rock solid for the past several years. But since the update, I am puzzled by a weird Debian Bhyve VM file system corruption problem: Basically, FreeBSD zpool reports no issue whatsoever, but the VMs kept reporting file system corruptions (inode problems and checksum mismatch, etc.).
Here is the start script for one of the VMs, in this case, a Pihole. The VM uses two ZFS block datasets: one for the root file system (5GB), and other one for the swap partition (2GB).
Code:
nohup bhyve -c 1 -m 1024M -w -H \
-s 0,hostbridge \
-s 4,virtio-blk,/dev/zvol/work/vm/pihole53 \
-s 5,virtio-blk,/dev/zvol/work/vm/pihole53_swap \
-s 6,virtio-net,tap53 \
-s 29,fbuf,tcp=0.0.0.0:5900,w=1024,h=768,wait -s 30,xhci,tablet \
-s 31,lpc -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd pihole53 &
Sometimes a VM cannot start, stuck at the Debian initramfs interface (see the image below). Debian complains about the file system and asks for a fsck. Then the VM may boot up normally after fsck (it will fix many inode problems), but it may also end up in a kernel panic and could not be recovered (had to be rebuilt). Even though a VM boots up, often times there are many problems with the root file system. In some cases, the root file system was remounted read-only.
On the FreeBSD host, zpool scrub shows that the zpool and zfs datasets are perfect while all these are happening.
The only substantial difference between the old Qotom i5 and the new Topton n100 machine is that the FreeBSD operating system runs from a USB enclosure (an mSata SSD inside) on the former, and it runs from a SATA enclosure (a M.2 B-Key SSD inside) on the latter. The FreeBSD version is the same, V14, patched to the latest. ZFS version is zfs-2.2.0-FreeBSD_g95785196f, and zfs-kmod-2.2.0-FreeBSD_g95785196f.
This is quite a headache. It is like a timed bomb. ZFS is supposed to be exceptionally reliable, and it has been for the past several years. I suspect the problem is faulty hardware, as the mini-PC boots up just fine. I have tried recreating the Debian virtual machines, changing the parameter of virtio-blk to nvme or achi-hd. Using snapshot rollback sometimes restores a working VM but not always. It is like the VM has its own mindset to decide when to go crazy.
Any ideas? Thanks much!
Debian VM initramfs screen. fsck fixes many issues.
Information provided by dmesg on a booted-up Debian VM.
Code:
[ 8.569264] EXT4-fs error (device sda2): ext4_find_extent:936: inode #52349: comm pihole-FTL: pblk 87225 bad header/extent: extent tree corrupted - magic f30a, entries 9, max 340(340), depth 0(0)
[ 8.569280] Aborting journal on device sda2-8.
[ 8.571911] EXT4-fs error (device sda2): ext4_journal_check_start:83: comm s6-rc: Detected aborted journal
[ 8.572125] EXT4-fs (sda2): Remounting filesystem read-only
[ 1.967922] FAT-fs (vda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 2.007173] EXT4-fs error (device vda2): ext4_lookup:1855: inode #140433: comm apparmor.system: iget: checksum invalid
[ 2.007183] Aborting journal on device vda2-8.
[ 2.007478] EXT4-fs error (device vda2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
[ 2.007879] EXT4-fs error (device vda2): ext4_journal_check_start:83: comm systemd-tmpfile: Detected aborted journal
[ 2.008909] EXT4-fs (vda2): Remounting filesystem read-only
[ 8.283215] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:405: inode #131491: comm s6-rmrf: No space for directory leaf checksum. Please run e2fsck -D.
[ 8.283222] EXT4-fs error (device vda2): htree_dirblock_to_tree:1082: inode #131491: comm s6-rmrf: Directory block failed checksum
[ 8.283230] Aborting journal on device vda2-8.
[ 8.284508] EXT4-fs error (device vda2): ext4_journal_check_start:83: comm dockerd: Detected aborted journal
[ 8.284682] EXT4-fs (vda2): Remounting filesystem read-only
[ 8.416508] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:405: inode #131491: comm dockerd: No space for directory leaf checksum. Please run e2fsck -D.
[ 8.416515] EXT4-fs error (device vda2): htree_dirblock_to_tree:1082: inode #131491: comm dockerd: Directory block failed checksum
[ 4.862167] EXT4-fs error (device vda2): ext4_validate_block_bitmap:420: comm ext4lazyinit: bg 29: bad block bitmap checksum
[ 4.862180] Aborting journal on device vda2-8.
[ 4.864966] EXT4-fs error (device vda2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
[ 5.102975] EXT4-fs (vda2): Remounting filesystem read-only
Last edited: