Kernel panics when given a high workload

nicogoldman22 · Feb 13, 2023

Hello! Good Monday to all the FreeBSD community.
I am working on the FreeBSD kernel for my university thesis. The idea is to make changes to the FreeBSD short-time scheduler so that all its operations are based on the concept of Petri Nets. We already have the modeling of said scheduler and the first tests running.

I am currently running into a problem that has left me out of ideas. Very randomly, the kernel throws page faults and reboots the OS. With my thesis partner, we tried to see when this problem happened but didn't find any pattern to reproduce it. We could see it mostly when the processor is heavily loaded, but as I said previously, only in some simulations.

I leave some information about the logs found; any help is appreciated.

Code:

uname -a
FreeBSD pielihueso 13.1-RELEASE FreeBSD 13.1-RELEASE DrudiGoldmanPI/update_petriNetScheduler-13.1.0-n250157-cb2e622cf22d PI_KERNELCONF amd64

Differences between PI_KERNELCONF and GENERIC are:

1. We are working on the 4BSD scheduler instead of the ULE:

Code:

# options     SCHED_ULE        # ULE scheduler
options     SCHED_4BSD        # 4BSD scheduler

2. We added some debugger options:

Code:

options        DDB
options        GDB
options        KDB_UNATTENDED

Code:

cd /usr/obj/usr/src/amd64.amd64/sys/PI_KERNELCONF/
kgdb kernel.debug /var/crash/vmcore.last

GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address    = 0xffffffffffffffa8
fault code        = supervisor write data, page not present
instruction pointer    = 0x20:0xffffffff80ca0822
stack pointer            = 0x28:0xfffffe00cd879b60
frame pointer            = 0x28:0x0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 922 (sshd)
trap number        = 12
panic: page fault
cpuid = 1
time = 1676286448
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00cd879920
vpanic() at vpanic+0x17f/frame 0xfffffe00cd879970
panic() at panic+0x43/frame 0xfffffe00cd8799d0
trap_fatal() at trap_fatal+0x385/frame 0xfffffe00cd879a30
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cd879a90
calltrap() at calltrap+0x8/frame 0xfffffe00cd879a90
--- trap 0xc, rip = 0xffffffff80ca0822, rsp = 0xfffffe00cd879b60, rbp = 0 ---
kern_select() at kern_select+0x942
Uptime: 34s
Dumping 371 out of 8085 MB:..5%..13%..22%..31%..44%..52%..61%..74%..82%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55        __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
[CMD](kgdb) where[/CMD]
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c2f521 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c2f99e in vpanic (fmt=0xffffffff811dfeea "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c2f7a3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810d7855 in trap_fatal (frame=0xfffffe00cd879aa0, eva=18446744073709551528) at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff810d78af in trap_pfault (frame=0xfffffe00cd879aa0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  0xffffffff80ca0822 in selrescan (td=<error reading variable: Cannot access memory at address 0xffffffffffffffd0>, ibits=<optimized out>, obits=<optimized out>) at /usr/src/sys/kern/sys_generic.c:1325
#9  kern_select (td=<optimized out>, nd=<error reading variable: Cannot access memory at address 0xffffffffffffff90>, fd_in=<optimized out>, fd_ou=<optimized out>, fd_ex=<optimized out>, tvp=<optimized out>,
    abi_nfdbits=<error reading variable: Cannot access memory at address 0x10>) at /usr/src/sys/kern/sys_generic.c:1206
Backtrace stopped: Cannot access memory at address 0x8

Code:

# nm -n /boot/kernel/kernel | grep  0xffffffff80ca0822
# nm -n /boot/kernel/kernel | grep  0xffffffff80ca0822
# nm -n /boot/kernel/kernel | grep  0xffffffff80c
# nm -n /boot/kernel/kernel | grep  0xffffffff
# nm -n /boot/kernel/kernel | grep  0xfffff
# nm -n /boot/kernel/kernel | grep  0xff
# nm -n /boot/kernel/kernel | grep  0x

ffffffff80388c30 t cam_compat_handle_0x17
ffffffff803891e0 t cam_compat_handle_0x18
ffffffff803895f0 t cam_compat_handle_0x19
ffffffff80389730 t cam_compat_translate_dev_match_0x18
ffffffff80aba6a0 t xl_check_maddr_90xB
ffffffff80aba6f0 t xl_check_maddr_90x
ffffffff80abad90 t xl_txeof_90xB
ffffffff80abb090 t xl_start_90xB_locked
ffffffff80ebac80 t mlx5e_fec_mask_10x_25x_handler
ffffffff80ebb050 t mlx5e_fec_avail_10x_25x_handler
ffffffff80ebb0f0 t mlx5e_fec_mask_50x_handler
ffffffff80ebb4e0 t mlx5e_fec_avail_50x_handler
ffffffff810af6c0 T Xint0x80_syscall_pti
ffffffff810af740 T Xint0x80_syscall
ffffffff810af743 t int0x80_syscall_common
ffffffff817fe180 r db_inst_0f0x
ffffffff8180cc50 r mouse10x14_120
ffffffff8180cd40 r mouse10x16_50
ffffffff8180cd90 r mouse10x16_75
ffffffff8180cde0 r mouse10x16_90
ffffffff8180ce30 r mouse10x16_100
ffffffff8180ce80 r mouse10x16_120
ffffffff8180ced0 r mouse10x16_133

We also tried with dtrace but with no luck. Any other recommendations on how we can keep debugging this issue? We know it's something we broke on the scheduler because the generic kernel is working decently.

P.S.: If someone is interested in how we implemented the Petri Net for the scheduler, contact me through the mail, and I can give you the paper we are working on.

SirDice · Feb 13, 2023

nicogoldman22 said:
I am working on the FreeBSD kernel for my university thesis.

I suggest you post to the mailing lists. There aren't many FreeBSD developers on these boards.

nicogoldman22 · Feb 13, 2023

SirDice said:
I suggest you post to the mailing lists. There aren't many FreeBSD developers on these boards.

Can you add me to the mailing list?

Thanks.

SirDice · Feb 13, 2023

Just sign up.

Erichans · Feb 13, 2023

Mailing lists can be found via here

If you haven't seen these, they may be helpful:

Introduction to Debugging the FreeBSD Kernel by John Baldwin - BSDCan 2008 article & slides
Writing Custom Command in FreeBSD's DDB Kernel Debugger by John Baldwin - EuroBSDcon 2022 - video & slides

nicogoldman22 · Feb 13, 2023

Thanks Erichans and SirDice ! I already posted it to the freebsd-questions mailing list.

covacat · Feb 13, 2023

try nm /boot/kernel/kernel|grep kern_select
your debugging syms seem out of sync with the debugged kernel

nicogoldman22 · Feb 13, 2023

Bash:

root@pielihueso ~# nm /boot/kernel/kernel | grep kern_select
ffffffff80c9fee0 T kern_select

cracauer@ · Feb 13, 2023

nicogoldman22 said:
We know it's something we broke on the scheduler because the generic kernel is working decently.

Not necessarily. It is also possible that the 4BSD scheduler has bitrot and would panic without your modifications.

Jose · Feb 13, 2023

cracauer@ said:
Not necessarily. It is also possible that the 4BSD scheduler has bitrot and would panic without your modifications.

This is the first thing I would test.

covacat · Feb 13, 2023

symbols look ok

Code:

        fdp = td->td_proc->p_fd;
ffffffff80c8a69e:       48 8b 40 48             mov    0x48(%rax),%rax
ffffffff80c8a6a2:       48 89 45 a8             mov    %rax,-0x58(%rbp)

the second assembly line is causing the panic because rbp is 0 (?)
hence the bombing address 0xffffffffffffffa8

ralphbsz · Feb 13, 2023

Fun project! Another suggestion: Try turning the debug options off, and see whether it works under intense load. BTDT got the T-shirt.

Alain De Vos · Feb 13, 2023

Is your system swapping ?

covacat · Feb 13, 2023

rbp seems to get foobared in seltdwait() but can't figure out why
seltdwait calls some stuff from kern_condvar.c and kern_mutex.c

_martin · Feb 13, 2023

nicogoldman22 Does it mean you not only changed the scheduler to 4BSD but you already modified it too? Meaning only you have the code that is being executed in that scheduler?

In the kgdb output you shared, could you do

Code:

f 8
i r
x/4i $pc

to verify what covacat mentioned? It does make sense and would explain the bogus virtual address you hit panic on (-0x58).

To debug this further I suggest you disable compiler optimization flags. As an example I'd opt for CFLAGS=-O in /etc/make.conf, recompile the kernel then.

As the issue is occurring randomly and seldom you could add your own assert in selrescan() (drop to debugger in case of a hit).
Live debugging is also handy - have other FreeBSD attach via gdb to this host via serial port. Let the system drop to debugger in case of a problem (assert), or on your static breakpoint.

nicogoldman22 · Feb 14, 2023

Jose said:
This is the first thing I would test.

We have already tried this. Maybe we didn't extinguish all the tests, but with the same load tests that broke using our scheduler and configurations, it didn't with the original.

nicogoldman22 · Feb 14, 2023

Alain De Vos said:
Is your system swapping ?

I have swap space configured, but I couldn't see it using it with top command, not in these cases of heavy load nor everyday use.

nicogoldman22 · Feb 14, 2023

_martin said:
nicogoldman22 Does it mean you not only changed the scheduler to 4BSD but you already modified it too? Meaning only you have the code that is being executed in that scheduler?

In the kgdb output you shared, could you do

Code:

f 8 i r x/4i $pc

to verify what covacat mentioned? It does make sense and would explain the bogus virtual address you hit panic on (-0x58).

To debug this further I suggest you disable compiler optimization flags. As an example I'd opt for CFLAGS=-O in /etc/make.conf, recompile the kernel then.

As the issue is occurring randomly and seldom you could add your own assert in selrescan() (drop to debugger in case of a hit).
Live debugging is also handy - have other FreeBSD attach via gdb to this host via serial port. Let the system drop to debugger in case of a problem (assert), or on your static breakpoint.

_martin covacat ! First of all, thank you both very much for the help!

I am working with a kernel that has code modified by my thesis partner and me. It has modifications mostly in the sched_4bsd file (if you are interested and it helps, I can give you access to our fork of the repository with the particular branch that is causing us problems).

I leave the requested outputs:

Bash:

(kgdb) frame 8
#8  0xffffffff80ca0822 in selrescan (td=<error reading variable: Cannot access memory at address 0xffffffffffffffd0>, ibits=<optimized out>, obits=<optimized out>) at /usr/src/sys/kern/sys_generic.c:1325
1325        fdp = td->td_proc->p_fd;

C:

(kgdb) list
1320        fd_mask bit;
1321        int fd, ev, n, idx;
1322        int error;
1323        bool only_user;
1324
1325        fdp = td->td_proc->p_fd;
1326        stp = td->td_sel;
1327        n = 0;
1328        only_user = FILEDESC_IS_ONLY_USER(fdp);
1329        STAILQ_FOREACH_SAFE(sfp, &stp->st_selq, sf_link, sfn) {

Bash:

(kgdb) i r
rax            0xfffffe00c1c08860  -2195772635040
rbx            0x1                 1
rcx            0x0                 0
rdx            0x0                 0
rsi            0x0                 0
rdi            0xfffff8002ff467a0  -8795288475744
rbp            0x0                 0x0
rsp            0xfffffe00c1315b60  0xfffffe00c1315b60
r8             0xfffff8006f019c50  -8794230645680
r9             0xfffffe00c1316000  -2195782017024
r10            0x8000000000000000  -9223372036854775808
r11            0x800000006f03ed50  -9223372034992247472
r12            0x9                 9
r13            0x0                 0
r14            0xfffff8006f019740  -8794230646976
r15            0xfffff8002ff46780  -8795288475776
rip            0xffffffff80ca0822  0xffffffff80ca0822 <kern_select+2370>
eflags         0x10246             [ PF ZF IF RF ]
cs             0x20                32
ss             0x28                40
ds             <unavailable>
es             <unavailable>
fs             <unavailable>
gs             <unavailable>
fs_base        <unavailable>
gs_base        <unavailable>

Bash:

(kgdb) x/4i $pc
=> 0xffffffff80ca0822 <kern_select+2370>:    mov    %rax,-0x58(%rbp)
   0xffffffff80ca0826 <kern_select+2374>:    mov    -0xd8(%rbp),%rax
   0xffffffff80ca082d <kern_select+2381>:    mov    0x8(%rax),%rax
   0xffffffff80ca0831 <kern_select+2385>:    movl   $0x0,-0x98(%rbp)

I'm also adding prints kgdb commands in case it helps.

Bash:

(kgdb) print fdp
$1 = (struct filedesc *) 0xfffffe00c1c08860
(kgdb) print *td
Cannot access memory at address 0xffffffffffffffd0

covacat · Feb 14, 2023

my wild guess is that something trashes the stack
function selrescan() is expanded inline and the code just returns from seltdwait() which pushes and pops rbp
but again the return address remains valid so the stack seems overwriiten just up to the return address
can you post a dump of [esp] around the current value

_martin · Feb 14, 2023

Disabling optimizations would help you during debugging as you can jump around frame and see what's where. Some brainstorming tips:
- in your modified code, did you use any inline assembly in C ? If so, did you specify trash regs correctly?
- in your modified code audit your writes (memcpy,...) and make sure you don't have hidden overflow (or if you do some sort of string manipulation infamous off by one)
- use assert() as I mentioned

In kgdb you can set set print pretty to make it easier to see structures. To print the structure (using my own example):

Code:

p *(struct thread*)0xfffffe00c23d390
$2 = {
  td_lock = 0xffffffff81cdb908 <sleepq_chains+264>,
  td_proc = 0xfffff80004c46548,
...

Also as I mentioned it might be a good idea to use live debugging now to see what's going on. Note watchpoints in gdb can have conditions even though sw watchpoints are painfully slow.
E.g. break on 0x2018a4 only if val whatever %rsi points to is 4: b *0x00000000002018a4 if *$rsi == 4.

nicogoldman22 · Apr 25, 2023

[UPDATES]
We have some updates on this issue. We have found a way to reproduce it.

As a reminder, this issue stems from some changes I am working on with a thesis project partner, where we are implementing the 4BSD scheduler using Petri nets. The modeling is already complete and currently, thanks to having everything through Petri nets, it is very simple to add modules that allow us to work on scheduling (for example, we added two modules very simply: one to turn processors on and off when requested, and the other to allow threads to monopolize processors also when desired).

The issue we were having was the one described in the issue description. We are working from virtual machines (we tried virtualbox and qemu) and the problem occurs ONLY when we have the network adapters turned on. When they are turned off and we work everything from the FreeBSD terminal, we have no problem. When we turn on the network card, we start to have these types of kernel fault pages randomly.

When we boot with an unmodified kernel, whether the network adapters are on or off, we have no problems.

With our modified kernel, it is the only time we experience page faults with the adapters turned on. This seems strange to us because we did not make any changes to anything related to networks (our changes were only made to scheduler files, kern_thread.c, sched_4bsd.c, and sys/proc.h).

_martin · Apr 25, 2023

Assuming you're crashing on the same or alike issue - page fault where VA is obviously bogus as in your example - 0xffffffffffffffa8 - network might be a victim of your code. E.g. buffer overflow, use of uninitialized pointer, etc.

Did you try to isolate it on VM with 1 CPU?

But other than that the old advice stands -- you need to debug it on fly.

Kernel panics when given a high workload

nicogoldman22

SirDice

Administrator

nicogoldman22

SirDice

Administrator

Erichans

nicogoldman22

covacat

nicogoldman22

cracauer@

Jose

covacat

ralphbsz

Alain De Vos

covacat

_martin

nicogoldman22

nicogoldman22

nicogoldman22

covacat

_martin

nicogoldman22

_martin