System freezes randomly

Hi, I installed FreeBSD 14.0 on HPE Microserver Gen 10, the OS is installed on an USB, the whole process is on this post.

Two months ago, I found the system totally froze, the services like Samba, Logitech Media Server and SSH all have no response, can not ssh into this server and the log file is stored on tmpfile, so after reboot nothing left.

I hooked a monitor and found also can not login with keyboard, there is errors about Samba shown, so I rebooted and disabled Samba, but didn't work.

Also found a thread, it has the similar problem and solved by downgrade NFSv4 to NFSv3, so I disabled NFS and rebooted, still freeze after 4 days.

Lastly I upgrade from 14.0 to 14.1, still didn't work.

I think it might be the problem of the WD hard drive, so took a smartctl -t long /dev/ada0 and the results is no problem.

Don't know what's the problem and where to go now.
 
Probably hardware so check all cables/connectors, re-seat everything (like RAM), run memtest, check temperatures/cooling (fans OK?)

Dual PSU? No power fluctuations?

If server grade hardware check system logs; when it freezes does BMC access work (think it’s called ILO on HP?)

Issues like these are hard to track down so a slow process of elimination as you have started.

I don’t think smartctl covers everything to do with a drive so that might still be a possibility.

Is the machine under load when it happens?
 
..I hooked a monitor and found also can not login with keyboard, there is errors about Samba shown, so I rebooted and disabled Samba, but didn't work.
So, it froze again? Looks like the most significant detail: at which point?

Also: did you check the clock, RAM/swapspace and disk usage? All 3 can cause this.
 
Is the machine under load when it happens?
No, it's just a home server, very low loading, CPU is only 0.2%, 80% Memory is unused.
No ILO, ILO is not available on Gen10.

Beastie7 Bottlenecked? It had been running without problem for 4 months, two months ago got the first time problem, then it lasts fewer time, only 4 days this time, sometimes just 1 day.
 
Why would you do that? That's probably the issue. The system is bottlenecked.
Why do you say that? Do you know that all the Brocade (now Broadcom) SAN switches and directors have their system installed on internal USB storage? They run most of the SANs in the world, so their solution shouldn't be that bad...

They run a custom Linux distribution, FWIW.
 
Why do you say that? Do you know that all the Brocade (now Broadcom) SAN switches and directors have their system installed on internal USB storage? They run most of the SANs in the world, so their solution shouldn't be that bad...

They run a custom Linux distribution, FWIW.
Yes, many servers in the world (perhaps a large majority) boot from USB drives. But they don't use consumer grade USB drives. Matter-of-fact, I don't even know where one could buy industrial-grade USB drives for this purpose.
 
Why would you do that? That's probably the issue. The system is bottlenecked.
Your experience is outdated. My daily system runs from USB.
Try a Transcend ESD series. These are gamechangers. 900MB/s read and 600MB/s write speed with access time shorter than any mechanical disk.
What still would be an improvement is stop limiting PC-chipsets to 1 internal USB-controller so we can also use it for RAID-like systems and networking. It would be the end of a 25 years old industrial scam as artificial bottleneck. The throughput of a PC is the main bus speed, not the width of any underlying controller that's forced to be alone.
 
This SanDisk USB pen is not the problem, I have a backup USB that cloned after everything ready by dd command, I replaced it with this backup USB and problem is not resolved.

Just disabled Jail service and see how it goes.
 
You need to identify root cause of the problem. To do this you change one thing at the time and test it. To be 100% sure if it's not the USB controller or the IOPs issue due to ZFS on USB2.0 i would suggest to take any SATA SSD drive connect it alone without any other disk (it may be some issue with other disks to cause the entire system to freeze) and install fresh installation of FreeBSD then play around with it for some time and see if it freeze or not. If it freeze again then you can rule out the USB. Then go to the next component - PSU, RAM, CPU (overheating for example), bad motherboard (leaking capacitors) and so on. It's only you who can test that one by one. There will be no one in the forum who have crystal ball to tell you where is the issue.
 
I have no experience with Swissbit memory sticks, but we have used Swissbit microSD cards (like this one) with Raspberry Pi's, it ran for years without trouble. Currently we have one instance active, which has run on the same microSD card (and Pi) since 2021.
 
Same here I have SwissBit microSD cards and they equal my favorite brand Apacer for microSD.

For USB sticks I have been using InnoDisk. 16GB and recently I found some 64GB for cheap.


But see some of my posts. I have seen some horrible long times on a USB2 machine making copy of ports tree on these.
But I have some that run FreeBSD Live Memstick with XFCE4 and others just for repairs and burning images to eMMC.
Something about small file copies kills performance.
Other than that no complaints. They do ~15MB/sec burning an images to it wth dd on USB2.
 
Thanks all for the input, instead of testing hardware, I would like to install Slackware instead to see how it works, as it is not easy to remove memory or hard drive etc. I will report back later.
 
It may be that Slackware doesn't "tickle" the same bit of hardware (memory address in RAM, or power draw on some component) so even if Slackware works, it might be hiding a hardware issue that will come back one day.

But at least you'll have a working system so that might be all you need, and it's definitely worth trying because if you do get the same issues (or others) it will definitely be pointing at hardware.
 
Also: did you check the clock, RAM/swapspace and disk usage? All 3 can cause this.
Thanks for the alert, noticed my system's time is about 5 minutes delay, so I started NTP service and till now it's been up 2 days.
 
Thanks for the alert, noticed my system's time is about 5 minutes delay, so I started NTP service and till now it's been up 2 days.
Happened to me a few times. The problem shouldn't exist in my opiion. It feels like inconsistent security. If random programs can encounter this without it being noticed, what else can abuse it?
 
The problem shouldn't exist in my opinion.
I agree. What gonna do if a server has no internet connection. I thought FreeBSD is solid stable.
The question is, what's the real problem behind it? I am not a programmer or will try to figure it out.
 
I agree. What gonna do if a server has no internet connection. I thought FreeBSD is solid stable.
The question is, what's the real problem behind it? I am not a programmer or will try to figure it out.
It is stable but hardware issues can cause problems and it’s not clear you have eliminated those.

There are tens of millions of lines of code in multiple layers and there will be issues.

The majority of issues I’ve had on any computers regardless of OS is hardware.
 
The majority of issues I’ve had on any computers regardless of OS is hardware.
My apologies, it's not about ntp. The OS froze again in the 6th day with NTP enabled, then I installed a fresh FreeBSD 14.1 on a SSD instead of the USB stick and it has been up for over 11 days now, so the problem is not the OS.

The issue might be misconfiguration of the OS installed on USB stick, maybe the tmpfs or others that I didn't set it properly, I will try to install it on the USB stick again without modification.
 
Thanks for the update.

The stability of USB sticks for running OSs from is an on-going discussion on these forums - it's either a perfectly good idea or a terrible idea depending on your point-of-view/experience

I think most people agree if you are going to do it, make sure it's as high-quality as you can.
 
Yes, many servers in the world (perhaps a large majority) boot from USB drives.
BOOTING is a different experience than actually RUNNING from a USB/flash drive. I have 8 or 9 servers that boot from 16G USB3 drives (some with only USB2 ports) reliably. But, they read the image off the drive and then run out of DRAM.

I limit how often there are writes to these (e.g., configuration changes). I'd never configure such a device to support swap. (people are invariably surprised at how quickly you can "wear out" such media!)
 
Back
Top