FreeBSD 12.1 Random restart

Hi,

I started to investigate our server problem - as I can see it is restarting without any issues - checked dmesg, messages, other logs - no error and nothing why it can be rebooted

last reboot
boot time Wed Jun 30 09:14
boot time Thu Jun 24 18:50

But before it - also 4 times server was rebooted in different days, different times - no correlation, but also checked "maybe"crontab - nothing

In /etc/rc.conf it is set: dumpdev="AUTO"

but:
crashinfo
No crash dumps in /var/crash.

Maybe You can tell me how to enable crash dumps? Or maybe someone had the same issue and knows how to avoid it?

Thanks,
 
VitS said:
checked dmesg, messages, other logs - no error and nothing why it can be rebooted

Could be a hardware problem, memory for instance. Use tools to check the hardware, memory and disks first. If that doesn't help you, you might enable debug and warning messages in /etc/syslog.conf. See man syslog.conf for that. That could show you software that causes a problem.
 
Seemingly random reboots is often a sign of bad memory, I'd definitely check that first.
 
With regards to the memory tests, I've had good results with sysutils/memtest86+, install it, write the image to a USB stick and boot the machine with it. There's also sysutils/memtest86 but loading that kernel module always resulted in an immediate reboot every time I tried to use it.
 
… In /etc/rc.conf it is set: dumpdev="AUTO"

but:
crashinfo
No crash dumps in /var/crash. …

Was dumpdev="AUTO" set before the incident?

Here (with GELI encryption):

Code:
dumpdev="/dev/ada0p3"
dumpdir="/var/crash"

Code:
% lsblk ada0
DEVICE         MAJ:MIN SIZE TYPE                              LABEL MOUNT
ada0             0:134 466G GPT                                   - -
  ada0p1         0:136 200M efi                        gpt/efiboot0 -
  ada0p2         0:138 512K freebsd-boot               gpt/gptboot0 -
  <FREE>         -:-   492K -                                     - -
  ada0p3         0:140  16G freebsd-swap                  gpt/swap0 SWAP
  ada0p3.eli     2:55   16G freebsd-swap                          - SWAP
  ada0p4         0:142 450G freebsd-zfs                    gpt/zfs0 <ZFS>
  ada0p4.eli     0:153 450G zfs                                   - -
  <FREE>         -:-   4.0K -                                     - -
%
 
I started to investigate our server problem - as I can see it is restarting without any issues - checked dmesg, messages, other logs - no error and nothing why it can be rebooted
There used to be a number of things which prevented crash dumps from being generated. The ones that could be fixed in software were mostly addressed. However, if there is some catastrophic hardware problem (for example, power problems or some hardware that is resetting the system) then there's nothing that FreeBSD can do to write a crash dump.

In the "old days", the easiest way to hopefully see what was happening was to use a serial port console instead of the video console. But that has become harder to explain / accomplish over time.

Before going further, I'd like to recommend a little script I wrote called "updown". You can get it here. You copy it into /usr/local/etc/rc.d/updown (note that you remove the .txt extension) and add:
Code:
updown_enable="YES"
to /etc/rc.conf. This will email the "root" user (which I assume is forwarded somewhere useful) whenever the system is shut down or boots. When it reports the system booting, it will tell you either how many seconds elapsed since an intentional shutdown command or tell you that this was likely a crash restart. That will let you eliminate the system being intentionally rebooted as well as emailing you right away if the system does a crash restart.

Once that has been eliminated, let's see if we can pinpoint something like a resource starvation issue. From a different system (which doesn't have to be FreeBSD), log into the problem system. Run a command like top -S or systat -v. If the system does a crash restart, the last displayed information will show the system state just before it crashed. That may let you discover something like running out of some resource, excessive interrupts, or other problems. Note that you're doing this from a different system - if you do it on the problem system's console, you'll lose the data when the hardware is initialized as part of the restart. If the error happens without any useful information being shown by those utilities, you may have to find some way to record what is happening on the console screen, perhaps by pointing a cellphone camera at it and hitting "record". That probably won't be useful if the console is in graphics mode, so if you're running a desktop environment you would probably need to disable that. Panic messages are generally just kernel printf's and don't bother putting the screen back into text mode first.

Report back when you've collected this additional info and we'll see where to go next.

IMPORTANT: You should never just blindly install code that someone hands you, particularly if it will run as root (as my updown script does). Give it at least a cursory looking-over to make sure it won't do anything nasty. The script is pretty simple. The only parts that are un-needed are the tests for configuration files for various local utilities, and if it doesn't find those files, it will happily run without them.

NOTE: There is a limitation (I hesitate to call it a bug) in this script - it assumes that the local system can generate the shutdown email and hand it to another system within 20 seconds. If that timeout is too short for your environment, you either won't get the shutdown email until the system reboots, or you'll get a copy when it shuts down and a duplicate when it reboots.
 
Back
Top