Random Crash

OP
Gamecreature

Gamecreature

Member

Reaction score: 8
Messages: 31

_martin thanks for your suggestion.
I changed the optimization to normal.. and added the extra incoming rule for the webproxy.
(There already was an inbound for this, but I used your example, so no $webserver_sto and $tcp_state is used for now)
 

covacat

Aspiring Daemon

Reaction score: 310
Messages: 628

you can look at the pd argument of the pf_test_rule call
it should point to a pf_pdesc structure (defined in pfvar.h) and you can extract the packet details (source, dest, tcp header, etc)
 
OP
Gamecreature

Gamecreature

Member

Reaction score: 8
Messages: 31

Did you get any more crashes yet? I'm curious..
Code:
# uptime
10:49PM  up 4 days, 13:36, 1 user, load averages: 0.91, 1.02, 1.09

Still running. ☺️
Looking good for now.
When running 7 days I will enable sto and tcp state again.. to check if it really is the optimization rule
 
OP
Gamecreature

Gamecreature

Member

Reaction score: 8
Messages: 31

Code:
# uptime
 2:31PM  up 9 days,  5:18, 1 user, load averages: 1.08, 1.26, 1.16

Still running, so that looks promising. I will now place back the $webserver_sto and $tcp_state to the webproxy rule. (without the optimize aggressive)
 
OP
Gamecreature

Gamecreature

Member

Reaction score: 8
Messages: 31

_martin no, I am now still running with 'optimization normal'. All other options are back to normal.

Code:
# uptime
 4:14PM  up 17 days,  8:01, 1 user, load averages: 0.85, 0.96, 0.92

I you're interested, I can enable aggressive again to test if this causes the crash.
(at the moment I'm very happy it is still running)
 
OP
Gamecreature

Gamecreature

Member

Reaction score: 8
Messages: 31

Well just got a new crash. Optimization agressive really seems to be causing this.

Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81065ba6
stack pointer           = 0x28:0xfffffe00841e90c0
frame pointer           = 0x28:0xfffffe00841e90d0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi4: clock (0))
trap number             = 12
panic: page fault
cpuid = 0
time = 1636153153
KDB: stack backtrace:
#0 0xffffffff80c574c5 at kdb_backtrace+0x65
#1 0xffffffff80c09ea1 at vpanic+0x181
#2 0xffffffff80c09d13 at panic+0x43
#3 0xffffffff8108b1b7 at trap_fatal+0x387
#4 0xffffffff8108b20f at trap_pfault+0x4f
#5 0xffffffff8108a86d at trap+0x27d
#6 0xffffffff81061958 at calltrap+0x8
#7 0xffffffff81065ab7 at in_cksum_skip+0x77
#8 0xffffffff82956329 at in4_cksum+0x59
#9 0xffffffff829373d0 at pf_return+0x270
#10 0xffffffff82931351 at pf_test_rule+0x1d71
#11 0xffffffff8292cd11 at pf_test+0x17c1
 

_martin

Daemon

Reaction score: 346
Messages: 1,166

Crash seems to be the same, that's good. Fault happened on the same address. Would you be willing to do this test with only one virtual CPU ?
Now is really a good time to open a PR.
I started the VM I created when I saw your thread, I'm downloading some random torrents in the jail inside that VM. I was not able to trigger the crash before though. I've increased the CPU amount from 2 to 4.
 
OP
Gamecreature

Gamecreature

Member

Reaction score: 8
Messages: 31

_martin, unfortunately this VPS runs at a hosting company (TransIP), I don't have control over the number of CPU's.
Another strange fact is that the another VPS also runs a similar firewall configuration with optimization aggressive and it doesn't happen there.
It only happens on this production VPS... (Did your VM crash?)
 

_martin

Daemon

Reaction score: 346
Messages: 1,166

Didn't you mention that the other VM is on the slightly older version ? As for this test we need to have (or at least that was the initial assumption) vtnet devices I'm using VirtualBox as hypervisor. My VM didn't crash, all torrents were downloaded without a problem several times. My uptime is around 9 hrs. Tests are still running.
Just to be sure - can you confirm you're still on 13.0-RELEASE-p4 ?
 
OP
Gamecreature

Gamecreature

Member

Reaction score: 8
Messages: 31

That's true, the previous VM had an older version. But I just upgraded it. (en re-enabled optimization aggressive)
Let's see if it crashes...

Yes the machines still runs 13.0-RELEASE-p4
Code:
# freebsd-version -kur
13.0-RELEASE-p4
13.0-RELEASE-p4
13.0-RELEASE-p4
 

_martin

Daemon

Reaction score: 346
Messages: 1,166

We did exchange few PMs with Gamecreature, he was able to trigger the crash in VM and simplify the PF config. We did some tests and found out that few things need to be set to trigger the bug. But once those are set you can crash the system within a second or two. This is not related to hypervisor (i.e. network driver) nor amount of CPU.

Behavior is very similar if not the same as described in not that old PR 254419. There's a link to a PR 259645 where the issue is being solved.
 
Top