Intel N100 generates data corruption ?

Hi, I run a mini machine as home-server with an Intel N100 + ZFS

Several months ago (almost a year), I've started a thread about a data corruption issue during a simple copy operation.
I noticed the problem when the machine was practically brand new.
Version installed currently it's 14.3-RELEASE but it was the same with 14.0-RELEASE at the time


I am reopening the issue because in a new thread I am still experiencing the same problem very often even after changing my NVME.

At first, I thought it was the noname NVMe that was of poor quality.
I replaced it with a Crucial NVMe, which seems more reliable. But same thing it creates corrumpted tar.gz

Everything seems to be working fine until the day you need to open tar.gz archive and you will face an unreadable archive

During the extract :
/local/lib/libcairo.so.2.11802.2: Truncated tar archive detected while reading data: Unknown error: -1
tar: Error exit delayed from previous errors.

It's very annoying, my bastille tar.gz archives backup are randomly corrupted and silently.

I had several mini PCs with exactly the same use case before this machine, which ran for years with different generations of Intel CPUs, and I never had any bit data corruption, even with ZFS without ECC RAM. Not once in almost 10 years.

With this N100 machine, almost one in five archives is unreadable.

I thought it was because of a crappy component, but I checked and the chipsets are Intel, the RAM is Crucial, and the NVMe seems fine. After doing some research, I found this thread that more or less confirms issues with this CPU :

UFS bad inode, mangled entry on Alder Lake-N(100)
https://lists.freebsd.org/archives/freebsd-current/2025-January/006984.html

And this
https://forum.opnsense.org/index.php?topic=48343.0

But it concerns UFS not ZFS ...

Can anyone confirm this issue with Intel N100 processors ? I also saw that some people recommended installing Intel microcode... (which I have never done on any machine). I hope it will help ...

If confirmed, I will replace it immediately with another CPU. I can no longer tolerate these random data corruptions on my mini server.
 
I followed the different recommendations, installed the Intel firmware, and added it to /boot/loader.conf

Code:
cpu_microcode_load="YES"
cpu_microcode_name="/boot/firmware/intel-ucode.bin"
vm.pmap.pcid_enabled=0

And will monitor in the next weeks if archives continue to be corrupted 😨
 
You should start proper hardware testing. memtest86+, mprime/prime95, SuerPi.

It is unlikely that the CPU model is responsible. It is more likely the individual PC. Overheating, bad RAM etc, the possibilities are endless.
 
I've got a couple of N100 mini pc's... never seen any problem like that. I'll add the pcid disable fix though, just in case.
 
Info about using cpupdate to update microcode
 
I've just run 2 full pass of Memtest during more that 1 hour : 0 errors.

I've been running stress-ng for over 30 minutes now, CPU at 100%, no errors, OS continue to respond normally, nothing in dmesg.
It's surprising, actually, that the OS continues to respond very well to shell command during the CPU test 😄

Code:
 % doas stress-ng --cpu 4 --iomix 4 --vm 8 --vm-bytes 8G --fork 8 --timeout 30m
stress-ng: info:  [39015] setting to a 30 mins run per stressor
stress-ng: info:  [39015] dispatching hogs: 4 cpu, 4 iomix, 8 vm, 8 fork
stress-ng: info:  [39399] iomix: using 256M file system space per stressor instance (total 1G of 199.40T available file system space)
stress-ng: info:  [39606] vm: using 1G per stressor instance (total 8G of 15.72G available memory)
stress-ng: info:  [39015] skipped: 0
stress-ng: info:  [39015] passed: 24: cpu (4) iomix (4) vm (8) fork (8)
stress-ng: info:  [39015] failed: 0
stress-ng: info:  [39015] metrics untrustworthy: 0
stress-ng: info:  [39015] successful run completed in 30 mins

And the machine never usually experiences this kind of load.
I'll try to find a tool that tests the disks.
 

Attachments

  • Screenshot_2025-10-19_19-22-08.png
    Screenshot_2025-10-19_19-22-08.png
    1.5 MB · Views: 18
  • Screenshot_2025-10-19_19-23-26.png
    Screenshot_2025-10-19_19-23-26.png
    180.7 KB · Views: 23
Promising initial tests after adding the microcode into the boot loader :

Previously : 😤
Code:
---------------------------
File: /mnt/nfs/backup/frontal/gitea_2025-10-12-030549.tgz
Calculated SHA256: de7755f6ae678a38e43058c2a48c7868d105f10f899df3ab558108e082990dc7
Expected SHA256: afeba95113912f6c5feb52a01fbf56cb736556fc9d4e7d4a54d5ce2445432675
Result: FAIL
---------------------------
File: /mnt/nfs/backup/frontal/jail-test_2025-06-01-030649.tgz
Calculated SHA256: 139d71ed12101283e8e69d358f7c3ebf720fb9d8347e9c57ec6cb869222a6cb3
Expected SHA256: e2218387b99af770ac181b12c63fe18c62aed6d2f33896f8c70ccdfb890c124d
Result: FAIL
---------------------------
File: /mnt/nfs/backup/frontal/mariadb_2025-10-05-030902.tgz
Calculated SHA256: 78cac460c7ace31da6b48461759fdedf889d0f6858c0865dcf56e17fb32d0e97
Expected SHA256: 86760d212949f142aaabb3789fbaa3f03bfc8061fc095b46597fe198e9f9cfc4
Result: FAIL
---------------------------
File: /mnt/nfs/backup/frontal/mariadb_2025-10-19-195139.tgz
Calculated SHA256: 7bc7e1ca66a72fb43f861b2f0b9381b05420b926044ab47ba32de25b6251ae0f
Expected SHA256: 7bc7e1ca66a72fb43f861b2f0b9381b05420b926044ab47ba32de25b6251ae0f
Result: OK
---------------------------
File: /mnt/nfs/backup/frontal/mariadb_2025-10-12-030954.tgz
Calculated SHA256: 6e0e2d11990473c8df9f615f0a20d638f49e39bff9223f9db036c384e61bafee
Expected SHA256: d37a0b910cfccda46580d7172a29e6d1e177d19611943ce2ccda53164ffe61ae
Result: FAIL
---------------------------
File: /mnt/nfs/backup/frontal/nextcloud-mlfa_2025-10-05-032122.tgz
Calculated SHA256: f8ec3e404ac0188b6420a860fe460cd00c3e1f78b8ac7de18ff0e8792e0e4326
Expected SHA256: 18f91f514dd4a19cc9a9042bc151fe973dbbb59e9ff47c559875662087a5d976
Result: FAIL
---------------------------
File: /mnt/nfs/backup/frontal/monitoring_2025-10-05-031117.tgz
Calculated SHA256: eebe210c42dd1b81329459a39f2c40f05adbf0b465c0352ad41c0a6f4d64adad
Expected SHA256: cfbf3cafa2630e783061d7193fe3cd744aa720b1db42de76b377dbc66425071a
Result: FAIL

And so on ... 🐽

And now : I launched a new backup batch of all my jails and ALL of them have OK checksums 🎉 🥳
 
It looks like that's fixed your problem. According to the lkml archive, intel dropped a microcode fix for the broken INVLPG instruction on N100 in March 2024 (github: intel-microcode-20240312). I haven't run any stress tests on my N100 box, but then I haven't seen any data corruption, either. Good news anyway, it looks like the microcode update fixes it. I must admit I sat up when I read your initial post, it's good to know about this, I wasn't previously aware of it.:)
 
For anyone else reading this, running:-

# pkg install cpu-microcode-intel

will install the latest microcode level.

And in /boot/loader.conf you need:-

# update cpu microcode
cpu_microcode_load="YES"
cpu_microcode_name="/boot/firmware/intel-ucode.bin"
 
Very informative. I was already thinking about a manufacturing defect in the PC. Sometimes, a large batch has one tricky issue. But in this case, things are different.
 
I was also watching this thread as I have a handful of N100 (and N300) systems running but I never had any issues with them. I must have been lucky and all of them were manufactured later and already got the new microcode included... (now that I'm thinking about it, IIRC the first one was purchased around mid to late 2024)
 
I have re-fitted the heatsink paste on my N100's. For some reason they seem to put a huge dollop of it on in the factory, which tends to make the layer too thick so you get thermal instability issues. I replaced mine with some better quality stuff (arctic mx-4), got lower cpu temps and they seem more stable. I still occasionally get lockups, but I think that's a known bug in firefox (infinite loop). In fact I've only ever had lockups on the box that runs firefox, not on the others. The small coolers they put in are usually barely adequate anyway, and the heatsink compound they use is the cheapest stuff they can get, the factory fit stuff that was in mine looked like HY510.
 
Yes the first one I bought was around sept or oct 2024 (I think, can't remember for sure), so perhaps that already had the fix. Anyhow it doesn't hurt to pick up the latest microcode from intel, in case they've fixed other bugs we don't know about. My faith in intel has fallen since spectre and meltdown, they seem to have had other bugs recently too, in the P-cores, although that won't affect N100.
 
I bought the machine in April 2024.

Code:
doas sysctl hw.model hw.ncpu hw.machine hw.physmem && doas dmesg | grep -i cpu
hw.model: Intel(R) N100
hw.ncpu: 4
hw.machine: amd64
hw.physmem: 16881967104
CPU microcode: updated from 0xe to 0x1d  <<<<<<< ?!?
CPU: Intel(R) N100 (806.40-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0: <ACPI CPU> on acpi0
hwpstate_intel0: <Intel Speed Shift> on cpu0
cpufreq0: <CPU frequency control> on cpu0
hwpstate_intel1: <Intel Speed Shift> on cpu1
cpufreq1: <CPU frequency control> on cpu1
hwpstate_intel2: <Intel Speed Shift> on cpu2
cpufreq2: <CPU frequency control> on cpu2
hwpstate_intel3: <Intel Speed Shift> on cpu3
cpufreq3: <CPU frequency control> on cpu3
 
Back
Top