FreeBSD crashed - trying to find out why

Hello,

I have a FreeBSD 8.1-RELEASE machine running inside a KVM VPS.

I installed tmux a few days back, and today I was trying to update all my ports via portupgrade. Everything was going fine, so after a while I detached from the tmux session and disconnected from the machine (was connected via SSH). When I returned a few hours later, I see that the machine had rebooted. It appears that the machine rebooted due to a kernel fault. From the timestamps I see that the machine rebooted some 10-15 minutes before I returned, so it looks like portupgrade and tmux etc were working fine until then.

Here's the messages from /var/log/messages -

Code:
kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 0; apic id = 00
kernel: fault virtual address   = 0x250
kernel: fault code              = supervisor read data, page not present
kernel: instruction pointer     = 0x20:0xffffffff8052e574
kernel: stack pointer           = 0x28:0xffffff800019c8a0
kernel: frame pointer           = 0x28:0xffffff800019c8c0
kernel: code segment            = base 0x0, limit 0xfffff, type 0x1b
kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags        = interrupt enabled, resume, IOPL = 0
kernel: current process         = 50496 (tmux)
kernel: trap number             = 12
kernel: panic: page fault
kernel: cpuid = 0
kernel: Uptime: 3d22h30m42s
kernel: Cannot dump. Device not defined or unavailable.
kernel: Automatic reboot in 15 seconds - press a key on the console to abort

I went into all the background about tmux above since the logs highlight tmux as the currently running process.

Any idea why kernel crashed, or what I can do to prevent this in future? If its to do with any incompatibility with tmux (and portupgrade or FreeBSD or KVM etc) then I can try and avoid using it in future.

Also, can you tell me what I can do to avoid this message - "Cannot dump. Device not defined or unavailable". Anything I can do to define the dump device?

Thanks.
Rakhesh
 
A "Fatal trap 12: page fault while in kernel mode" and "supervisor read data, page not present" usually indicates bad memory or a bad harddisk (bad sectors in swap).

As for configuring a dump device read the crash(8) and dumpon(8) man pages.
 
Wow, that was a fast reply. :) Thanks for the pointers.

Since this machine is running within KVM, could it be that the memory and hard disk are fine but the KVM layer in between is causing problems?

I will pass this onto the people who maintain the VPS. Maybe they've got similar reports from others.

Thanks again!
 
rakhesh said:
Since this machine is running within KVM, could it be that the memory and hard disk are fine but the KVM layer in between is causing problems?
I don't know enough about the relation KVM <-> FreeBSD but it might be. I do have some experience with virtual machines and I know FreeBSD needs to be 'aware' of it.

You could also post this on the mailinglists. It may be that FreeBSD isn't quite compatible with this particular KVM.
 
Which version of KVM? Using which Linux kernel version?

Which version of FreeBSD?

Are you using IDE or SCSI emulated virtual harddrives?

Which virtual NIC are you using?

And does it only happen when tmux is running? Or do you get the same crash when tmux is not running? Which version of tmux is installed?
 
rakhesh said:
Hello,

I have a FreeBSD 8.1-RELEASE machine running inside a KVM VPS.

I installed tmux a few days back, and today I was trying to update all my ports via portupgrade. Everything was going fine, so after a while I detached from the tmux session and disconnected from the machine (was connected via SSH). When I returned a few hours later, I see that the machine had rebooted. It appears that the machine rebooted due to a kernel fault. From the timestamps I see that the machine rebooted some 10-15 minutes before I returned, so it looks like portupgrade and tmux etc were working fine until then.

You know tmux is in the base distrobution right? There's no need to install the port unless you *need* the latest/greatest, and actually it's probably better to not install the port as it can potentially break things.

Also the case of a guest "being aware" it's a VM is generally called paravirtualization, and FreeBSD absolutely does not need to be a PV guest. Support for that type of config is quite limited as the i386 XEN PV stuff is the only thing out there that's true PV. It's not something most would want to put in a production environment as it's buggy and not currently supported.
 
phoenix said:
Which version of KVM? Using which Linux kernel version?

Which version of FreeBSD?

Are you using IDE or SCSI emulated virtual harddrives?

Which virtual NIC are you using?

And does it only happen when tmux is running? Or do you get the same crash when tmux is not running? Which version of tmux is installed?

I don't know the Linux and KVM versions. Its a VPS, but I can check with the provider.

FreeBSD 8.1.

IDE drivers.

Intel PRO/1000 Legacy Network connection.

It happened the first time today. I installed tmux day-before yesterday (but didn't use it much yesterday). I've had this VPS for a week now and its never crashed so far. The machine crashed once more today. I was going to reboot it and it crashed. Similar error messages, and it showed tmux as the culprit. However when I tried rebooting again with tmux sessions disconnected it didn't crash.

My hunch would be that it must be something to do with lot of output in the tmux session. As in, the first time it crashed I was running portupgrade - so there was lot of output to the tmux session. And the second time it crashed, I had gone through the log files etc, so possibly all that output was buffered in tmux.
 
Tmux isn't in base..

Code:
node15# find / -name "tmux*" -print
/usr/ports/sysutils/tmux

Note to self: refresh first then post..
 
Try running for a few days without tmux and see if things are stable. It may be an interaction between tmux and the virtual console drive in KVM. If things run reliably without tmux, even during port upgrades and what not, then you may have to live without tmux for a while.

You can try installing sysutils/screen as an alternative to tmux. I have a .screenrc that configures screen to work/look like tmux, if desired.

Running the same commands (port upgrades, scroll through logs, etc) via screen may also help to narrow down where the problem is.
 
phoenix said:
Try running for a few days without tmux and see if things are stable. It may be an interaction between tmux and the virtual console drive in KVM. If things run reliably without tmux, even during port upgrades and what not, then you may have to live without tmux for a while.

You can try installing sysutils/screen as an alternative to tmux. I have a .screenrc that configures screen to work/look like tmux, if desired.

Running the same commands (port upgrades, scroll through logs, etc) via screen may also help to narrow down where the problem is.

Thanks again. Good to know that it could be something to do with the interaction between tmux and the virtual console driver in KVM. I just had a hunch, good to hear you too think it could be a possibility.

I will try screen as an alternative and see if it too crashes the system. I don't use tmux much except to protect myself against disconnections so screen should be fine for that.

Cheers.
 
Making sense of your crash

DutchDaemon said:
It's not. You crash in 'swapper', this user crashes in 'tmux'.

The 'current process' often doesn't match up to 'why' the box crashed -- in my limited experiences with crashes.

Here is some more helpful info on dealing with crashes:
* consult this URL to make sense of the 'instruction pointer' info: http://www.unixguide.net/freebsd/faq/18.13.shtml.
* before your next crash, set this value in your 'rc.conf' file: dumpdev="AUTO"
* here is how to debug your dump file: http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html (I just refreshed my memory by reading that manual page... now to wait for my BSD 8.1-STABLE to crash again...)
* know where your kernel.debug lives... cd /usr/obj/usr/src/sys/KERNCONF
 
Oh, even if the crash reason is different (who knows), was all have the same problem... Fatal 12. :) Two fatal 12 crashes are more closely related than the output of a 'crash' and the output of the 'ls' command.

Reading this thread has got me thinking... I have a $70 SSD -- maybe that makes for a poor swap device.... Or maybe there is a correlation between ECC and non-ECC systems ... (my embedded motherboard only supports non-ECC RAM). Of course, I always run memtest before deploying boxes, but maybe there are some errors that crop up later.

Anyone confirm that Sig 12 is related to RAM? I always though Sig 11 was RAM...
 
Rudy said:
The 'current process' often doesn't match up to 'why' the box crashed -- in my limited experiences with crashes.

Here is some more helpful info on dealing with crashes:
* consult this URL to make sense of the 'instruction pointer' info: http://www.unixguide.net/freebsd/faq/18.13.shtml.
* before your next crash, set this value in your 'rc.conf' file: dumpdev="AUTO"
* here is how to debug your dump file: http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html (I just refreshed my memory by reading that manual page... now to wait for my BSD 8.1-STABLE to crash again...)
* know where your kernel.debug lives... cd /usr/obj/usr/src/sys/KERNCONF

Cool, thanks for that Rudy! Very informative. :)
 
Rudy said:
The 'current process' often doesn't match up to 'why' the box crashed -- in my limited experiences with crashes.

Sure, but not even getting to the point where either a new or an old kernel will load (crashing on process 0), or crashing when the system is well up and running processes in userland (tmux) -- hardly comparable, imo.
 
Back
Top