Random Crash

Those two servers were (freebsd-)updated from 12.2
Mine has been running for some time longer, I think this one started as 11.0 and has been progressively upgraded since. I'm honestly not sure where it started with but it's been upgraded a number of times over the years.

You may want to log a call with TransIP, contrary to a lot of other providers they actually have pretty good knowledge of FreeBSD.
 
Mine has been running for some time longer, I think this one started as 11.0 and has been progressively upgraded since. I'm honestly not sure where it started with but it's been upgraded a number of times over the years.

You may want to log a call with TransIP, contrary to a lot of other providers they actually have pretty good knowledge of FreeBSD.
I contacted them, (I agree they do know a lot), but they didn't know what's the reason of this specific situation.
 
Well I didn't have to wait 7 days. Crash just happend again.
Option `ifconfig vtnet0 -rxcsum` unfortunately didn't work .


Code:
Mon Oct 11 20:30:27 CEST 2021

FreeBSD myserver.com 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.am>

panic: page fault

GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
...
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x138
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81065b8d
stack pointer           = 0x28:0xfffffe00841e90c0
frame pointer           = 0x28:0xfffffe00841e90d0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi4: clock (0))
trap number             = 12
panic: page fault
cpuid = 2
time = 1633976879
KDB: stack backtrace:
#0 0xffffffff80c574c5 at kdb_backtrace+0x65
#1 0xffffffff80c09ea1 at vpanic+0x181
#2 0xffffffff80c09d13 at panic+0x43
#3 0xffffffff8108b1b7 at trap_fatal+0x387
#4 0xffffffff8108b20f at trap_pfault+0x4f
#5 0xffffffff8108a86d at trap+0x27d
#6 0xffffffff81061958 at calltrap+0x8
#7 0xffffffff81065ab7 at in_cksum_skip+0x77
#8 0xffffffff82956329 at in4_cksum+0x59
#9 0xffffffff829373d0 at pf_return+0x270
#10 0xffffffff82931351 at pf_test_rule+0x1d71
#11 0xffffffff8292cd11 at pf_test+0x17c1
#12 0xffffffff82945bff at pf_check_out+0x1f
#13 0xffffffff80d42137 at pfil_run_hooks+0x97
#14 0xffffffff80db2f21 at ip_output+0xb61
#15 0xffffffff80dc9664 at tcp_output+0x1b04
#16 0xffffffff80dd80df at tcp_timer_rexmt+0x59f
#17 0xffffffff80c25b0d at softclock_call_cc+0x13d
Uptime: 1d8h39m11s
Dumping 1224 out of 8152 MB:..2%..11%..21%..31%..41%..51%..61%..71%..82%..91%

Code:
#9  0xffffffff81065b8d in in_cksumdata (buf=<optimized out>,
    len=len@entry=299) at /usr/src/sys/amd64/amd64/in_cksum.c:111
#10 0xffffffff81065ab7 in in_cksum_skip (m=0xfffff801bc4fbb00, len=299,
    skip=<optimized out>) at /usr/src/sys/amd64/amd64/in_cksum.c:224
#11 0xffffffff82956329 in in4_cksum (m=0x138, nxt=<optimized out>,
    nxt@entry=6 '\006', off=3, len=<optimized out>)
    at /usr/src/sys/netpfil/pf/in4_cksum.c:117
#12 0xffffffff829373d0 in pf_check_proto_cksum (m=0xfffff80023867c00,
    off=<optimized out>, len=1, p=6 '\006', af=2 '\002')
    at /usr/src/sys/netpfil/pf/pf.c:5844
#13 pf_return (r=r@entry=0xfffff801d6149000, nr=<optimized out>,
    nr@entry=0xfffff8019a094000, pd=pd@entry=0xfffffe00841e96d0,
    sk=<optimized out>, off=<optimized out>, off@entry=20, m=<optimized out>,
    m@entry=0xfffff80023867c00, th=0xfffffe00841e97a0,
    kif=0xfffff8000afa5a00, bproto_sum=35433, bip_sum=0, hdrlen=20,
    reason=0xfffffe00841e955e) at /usr/src/sys/netpfil/pf/pf.c:2654
#14 0xffffffff82931351 in pf_test_rule (rm=rm@entry=0xfffffe00841e9770,
    sm=sm@entry=0xfffffe00841e9788, direction=direction@entry=2,
    kif=kif@entry=0xfffff8000afa5a00, m=m@entry=0xfffff80023867c00, off=20,
    pd=0xfffffe00841e96d0, am=0xfffffe00841e9760, rsm=0xfffffe00841e9750,
    inp=0xfffff800398f43d0) at /usr/src/sys/netpfil/pf/pf.c:3641
#15 0xffffffff8292cd11 in pf_test (dir=<optimized out>, dir@entry=2,
    pflags=<optimized out>, ifp=<optimized out>, m0=<optimized out>,
    m0@entry=0xfffffe00841e9948, inp=0xfffff800398f43d0)
    at /usr/src/sys/netpfil/pf/pf.c:6005
#16 0xffffffff82945bff in pf_check_out (m=0xfffffe00841e9948, ifp=0x3,
    flags=299, ruleset=<optimized out>, inp=0xffffff00)
    at /usr/src/sys/netpfil/pf/pf_ioctl.c:4516
#17 0xffffffff80d42137 in pfil_run_hooks (head=<optimized out>, p=...,
    ifp=0xfffff8000389c800, flags=flags@entry=131072,
    inp=inp@entry=0xfffff800398f43d0) at /usr/src/sys/net/pfil.c:187
#18 0xffffffff80db2f21 in ip_output_pfil (mp=0xfffffe00841e9948,
    ifp=0xfffff8000389c800, flags=0, inp=0xfffff800398f43d0,
    dst=0xfffff800398f4578, fibnum=<optimized out>, error=<optimized out>)
    at /usr/src/sys/netinet/ip_output.c:130
#19 ip_output (m=m@entry=0xfffff80023867c00, opt=<optimized out>,
    ro=<optimized out>, flags=0, imo=imo@entry=0x0, inp=<optimized out>)
    at /usr/src/sys/netinet/ip_output.c:705
#20 0xffffffff80dc9664 in tcp_output (tp=0xfffffe00ecc28418)
    at /usr/src/sys/netinet/tcp_output.c:1492
#21 0xffffffff80dd80df in tcp_timer_rexmt (xtp=0xfffffe00ecc28418)
    at /usr/src/sys/netinet/tcp_timer.c:879
#22 0xffffffff80c25b0d in softclock_call_cc (c=0xfffffe00ecc286a0,
    cc=cc@entry=0xffffffff81ca8200 <cc_cpu>, direct=direct@entry=0)
    at /usr/src/sys/kern/kern_timeout.c:696
#23 0xffffffff80c25f99 in softclock (arg=0xffffffff81ca8200 <cc_cpu>)
    at /usr/src/sys/kern/kern_timeout.c:816
#24 0xffffffff80bcafdd in intr_event_execute_handlers (p=<optimized out>,
    ie=0xfffff800035d4700) at /usr/src/sys/kern/kern_intr.c:1168
#25 ithread_execute_handlers (p=<optimized out>, ie=0xfffff800035d4700)
    at /usr/src/sys/kern/kern_intr.c:1181
#26 ithread_loop (arg=arg@entry=0xfffff800035dadc0)
    at /usr/src/sys/kern/kern_intr.c:1269
#27 0xffffffff80bc7dde in fork_exit (
    callout=0xffffffff80bcad90 <ithread_loop>, arg=0xfffff800035dadc0,
    frame=0xfffffe00841e9d40) at /usr/src/sys/kern/kern_fork.c:1069
#28 <signal handler called>

It's another location in the c-file. ( /usr/src/sys/amd64/amd64/in_cksum.c:111)
Code:
static u_int64_t
in_cksumdata(const void *buf, int len)
{
   const u_int32_t *lw = (const u_int32_t *) buf;
   #..
   lw = (u_int32_t *) (((long) lw) - offset);  # <= line 111
   #...

It happens when accessing the content of the *buf pointer. Which invalid or is being accessed out of bounds
 
Well, #PF is on virtual address 0x138 so definitely invalid. Only slightly different to previous crashes you shared where it failed on 0.

Would you be willing, or is it possible to share the coredump? Note dump could have some private data in it, depending on what's running on the server.
If you are able to reproduce the crash it's worth opening the PR. Slightly off-topic but if you are on ZFS boot environments are really neat feature. It's easy to boot back after upgrade in case of an issue.
 
I'm also wondering if you're doing anything special with PF. I have a fairly basic ruleset and it has never crashed on me. Maybe it's something related to your ruleset.
 
at #12 len=1 looks suspicious because such a short len should cause the function to return imediatly and #11 wont be reached
Code:
static int
pf_check_proto_cksum(struct mbuf *m, int off, int len, u_int8_t p, sa_family_t af)
{
        u_int16_t sum = 0;
        int hw_assist = 0;
        struct ip *ip;

        if (off < sizeof(struct ip) || len < sizeof(struct udphdr))
                return (1);
 
I'm also wondering if you're doing anything special with PF. I have a fairly basic ruleset and it has never crashed on me. Maybe it's something related to your ruleset.
I was wondering the same thing. Does it help if I post the ruleset? (It's a pretty big ruleset).

covacat that's indeed very suspicious. Do you know what can cause this?
I've just checked the md5 of /boot/kernel/pf.ko at it's the same of other 13.0-RELEASE-p4 versions. (I thought, maybe the pf.ko wasn't updated)
Code:
root@myserver:/boot/kernel # md5 pf.ko
MD5 (pf.ko) = 366b88ca4e3a517e0cbcefe90c47e987

grahamperrin, I was also thinking a certain jail was causing it. But don't really know where to start looking. (At the moment I'm slowly migrating stuff from this server to another server).

_martin, I could post a coredump, though I see it indeed contains some sensitive data, mainly server-names (is a core.txt.0 file enough? because I can filter the sensitive at a in this file. Doing this in a vmcore.0 seems a bit tricky)
 
at #12 len=1 looks suspicious because such a short len should cause the function to return imediatly and #11 wont be reached
Code:
static int
pf_check_proto_cksum(struct mbuf *m, int off, int len, u_int8_t p, sa_family_t af)
{
        u_int16_t sum = 0;
        int hw_assist = 0;
        struct ip *ip;

        if (off < sizeof(struct ip) || len < sizeof(struct udphdr))
                return (1);
He set rxcksum to off so, that would be expected.

My guess is mbuf struct is bogus and we're wandering off into lala land and then boom.
 
He set rxcksum to off so, that would be expected.
I think it looks suspicious the code sill continues and goes deeper in the stack trace.
How is it possible the call stack goes deeper where 'len < sizeof(struct udphdr). It should return imediately and not call in4_cksum

Is it possible my GDB or debugging symbols are incorrect? It is a binary kernel. I don't really know where this info is coming from
 
_martin, I could post a coredump, though I see it indeed contains some sensitive data, mainly server-names (is a core.txt.0 file enough? because I can filter the sensitive at a in this file. Doing this in a vmcore.0 seems a bit tricky)

You're best to raise a PR with this information. The kernel.debug and vmcore will be required.
Posting cores and stuff here won't get it fixed, only a PR will.
I think it's best to raise a PR and push the work onto development.
 
I think it looks suspicious the code sill continues and goes deeper in the stack trace.
How is it possible the call stack goes deeper where 'len < sizeof(struct udphdr). It should return imediately and not call in4_cksum
That's an || (or) before that. Either/or. As this is not a UDP header (and they don't have checksums anyway), this is not relevant.
I'm only looking at what code you've posted. I've not looked at the full /usr/src/sys/amd64/amd64/in_cksum.c.
Again, my guess is the pf.c is giving dud buffer data. Why? Who knows, a combination of rules that trigger it. A variation in the jail and host causing something?

Is it possible my GDB or debugging symbols are incorrect? It is a binary kernel. I don't really know where this info is coming from
Nope. It's provided by the symbols inside the kernel.. or it's magic. ;)
 
i assume the dump values are extracted from the stack at the fault moment so it might be some stack corruption (or gdb bug)
more weirdness is that len becomes 299 after and the code does not seem to touch it
 
As I mentioned earlier if you are able to reproduce this it's worth opening a PR.
In my opinion tracing here few lines of codes is useless. You need to have the full view and context what happened. If you are using generic kernel (the p4 update you mentioned) only vmcore is needed, everything else is the same and hence not needed. You are not able to obfuscate (at least not easily) private data in kernel dump (it's a memory dump of a running kernel). Sharing that image though is better through some public cloud, etc.

Even for purpose of the PR it's worth creating vanilla VM with the same settings but no private data in. Record the steps you are doing that lead to the crash and share it in the PR.

Posting cores and stuff here won't get it fixed, only a PR will.
Yes, but that doesn't mean somebody on forum can't have a look and fix.
 
Is it possible my GDB or debugging symbols are incorrect? It is a binary kernel. I don't really know where this info is coming from
And still, I suggest to build your own kernel and try again with it. Remove all devices from custom kernel you will never use. The memory layout of your own kernel would be different.
I started to think that this might be some hardware error, memory or network interface. The checksum error may-be real, but why is there a checksum error?
 
Code:
vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
Compared that with mine:
Code:
vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4c04bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
I do have net.inet.tcp.tso=0 in /etc/sysctl.conf too. But I've also explicitly disabled it on the interface:
Code:
ifconfig_vtnet0="inet X.X.X.X netmask 255.255.255.0 -tso"
Don't know if that's going to make a difference but in the output from my ifconfig(8) there is no TSO4 or TSO6 while yours does have them. Maybe setting the sysctl but the interface still having the option triggers an error condition.

You can turn this off "on-the-fly" with ifconfig vtnet0 -tso.
 
I was wondering the same thing. Does it help if I post the ruleset? (It's a pretty big ruleset).
Yes please.
I've set two VMs - one in VirtualBox and one in qemu. I've the same version as you have. I'm trying to stress test them both to see the crash. I've PF loaded but I've only scrub and pass in/out rules in.
Are you using vnet jails by any chance?

Edit: I've been running VirtualBox VM for 20hrs now. I've created a samba in the jail, accessible via NAT behind my egress interface:
/etc/pf.conf
Code:
ext_if="vtnet0"
scrub in all

nat pass on $ext_if from 192.168.252.0/24 to any -> 172.16.1.80
rdr pass on $ext_if proto tcp to 172.16.1.80 port 80 -> 192.168.252.2
rdr pass on $ext_if proto tcp to 172.16.1.80 port 2222 -> 192.168.252.2 port 22
rdr pass on $ext_if proto {tcp,udp} to 172.16.1.80 port 445 -> 192.168.252.2
rdr pass on $ext_if proto {tcp,udp} to 172.16.1.80 port 135:139 -> 192.168.252.2

pass in all
pass out all
/etc/rc.conf
Code:
hostname="tbsd"
ifconfig_vtnet0="inet 172.16.1.80 netmask 255.255.255.0"
defaultrouter="172.16.1.1"

dumpdev="AUTO"
sshd_enable="YES"
pf_enable="YES"
jail_enable="YES"

cloned_interfaces="lo252"
ifconfig_lo252_alias1="inet 192.168.252.2 netmask 255.255.255.255"

devfs_load_rulesets=YES

Code:
# uname -a
FreeBSD tbsd 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

I've been constantly generating traffic from and to the jail ; I was not able to trigger any issue.
 
Here's my (too big) pf.conf. (I've included the external include pf.macros.conf). And obfuscated all public ip's. (Changed DNS to Google DNS). The ruleset is very explicit, requiring in and out rules per jail.

/etc/pf.conf
Code:
# 1. Macros
#==========
#-- begin include '/etc/pf.macros.conf'
ext_if="vtnet0"
jail_if="lo1"
jail_net     = "10.0.2.0/24"
jail_net6    = "fd::/64"

# obfucated
wan_ip4 = "1.1.1.1"
wan_ip6 = "1:1:1:1::1"

ip_core      = "10.0.2.100"
ip_db        = "10.0.2.101"
ip_webproxy  = "10.0.2.103"
ip_mail      = "10.0.2.104"
ip_git       = "10.0.2.106"
ip_w0        = "10.0.2.107"
ip_dns       = "10.0.2.108"
ip6_core      = "fd::2:100"
ip6_db        = "fd::2:101"
ip6_webproxy  = "fd::2:103"
ip6_mail      = "fd::2:104"
ip6_git       = "fd::2:106"
ip6_w0        = "fd::2:107"
ip6_dns       = "fd::2:108"
dns_ips = "{ 2001:4860:4860::8888 2001:4860:4860::8844 8.8.8.8 8.8.6.6 }"
jails_default_allowed_out_ports = "{ http, https, pop3 imap pop3s imaps 1022 }"
## -- end /etc/pf.macros.con


# Quiet block log
quiet_block_tcp = "{135 137 138 139 445}"
quiet_block_udp = "{137 138}"


# overload checks
webserver_sto = "(max 8192, source-track rule, max-src-conn 256, max-src-nodes 1024, max-src-conn-rate 500/5, overload <overloadlist> flush global)"
ssh_sto = "(max 8192, source-track rule, max-src-conn 256, max-src-nodes 1024, max-src-conn-rate 500/5, overload <overloadlist> flush global)"
# extra items
tcp_state ="flags S/SAFR modulate state"
udp_state ="keep state"
tcp6_state ="keep state"
udp6_state ="keep state"
www_ports = "{ 80, 443 }"
monit_ports = "{ 2812 }"
mail_ports = "{ 25, 2525, 456, 587, 456, 2526 }"
redis_ports = "6300:6399"


# 2. Tables
#==========
table <whitelist> persist file "/etc/pf.whitelist"
table <blacklist> persist file "/etc/pf.blacklist"
table <overloadlist> persist


# 3. Options
#===========
set block-policy return
set loginterface $ext_if
set skip on lo0
set optimization aggressive


# 4. Packet Normalization
#========================
scrub in all


# 5. Bandwidth management
#========================


# 6. Translation
#===============
## External NAT
nat on $ext_if inet from $jail_net to any port != 25 -> $wan_ip4
nat on $ext_if inet6 from $jail_net6 to any port != 25 -> $wan_ip6


## only tanslated allowd out ports
nat on $ext_if inet from $jail_net to any port $jails_default_allowed_out_ports -> $wan_ip4
nat on $ext_if inet6 from $jail_net6 to any port $jails_default_allowed_out_ports -> $wan_ip6
nat on $ext_if inet proto icmp from $jail_net to any -> $wan_ip4
nat on $ext_if inet6 proto icmp from $jail_net6 to any -> $wan_ip6


## only the MAIL jail can send mail
nat on $ext_if inet from $ip_mail to any port $mail_ports -> $wan_ip4
nat on $ext_if inet6 from $ip6_mail to any port $mail_ports -> $wan_ip6


# 7. Redirection
#===============
rdr log on $ext_if inet proto tcp from any to $wan_ip4 port { 22 } -> $ip_git
rdr log on $ext_if inet6 proto tcp from any to $wan_ip6 port { 22 } -> $ip6_git
rdr log on $ext_if inet proto tcp from any to $wan_ip4 port { 1022 } -> $ip_w0
rdr log on $ext_if inet6 proto tcp from any to $wan_ip6 port { 1022 } -> $ip6_w0
rdr log on $ext_if inet proto tcp from any to $wan_ip4 port $www_ports -> $ip_webproxy
rdr log on $ext_if inet6 proto tcp from any to $wan_ip6 port $www_ports -> $ip6_webproxy
rdr log on $ext_if inet proto { tcp, udp } from any to $wan_ip4 port { 53 } -> $ip_dns
rdr log on $ext_if inet6 proto { tcp, udp } from any to $wan_ip6 port { 53 } -> $ip6_dns
rdr log on $ext_if inet proto tcp from <whitelist> to $wan_ip4 port { 4275 } -> $ip_dns
rdr log on $ext_if inet6 proto tcp from <whitelist> to $wan_ip6 port { 4275 } -> $ip6_dns
no rdr


# 8. Packet Filtering
#====================
anchor "blacklistd/*" in on $ext_if
block in log all
block out log all
antispoof for $ext_if
antispoof for $jail_if
icmp_ping ="icmp-type 8 code 0"
pass in quick on $ext_if inet proto icmp to ($ext_if) $icmp_ping keep state label "core|in|icmp"
pass in quick proto icmp6 all label "core|in6|icmp"  # icmp6 allow all (ipv6)


## Whitelist / blacklist
pass in quick on $ext_if inet from <whitelist>
pass in quick on $ext_if inet6 from <whitelist>
block drop in quick on $ext_if inet from <blacklist>
block drop in quick on $ext_if inet6 from <blacklist>


#10.0.0.0/8, 192.168.0.0/16
martians = "{ 127.0.0.0/8,  172.16.0.0/12, 169.254.0.0/16, 192.0.2.0/24,  0.0.0.0/8, 240.0.0.0/4 }"
block drop in quick on $ext_if from $martians to any label "!core|in|martian-$dstport"
block drop out quick on $ext_if from any to $martians label "!core|out|martian-$dstport"

## quiet block drop
block drop in quick on $ext_if inet proto tcp from any to any port $quiet_block_tcp label "drop quiet: $dstport"
block drop in quick on $ext_if inet proto udp from any to any port $quiet_block_udp label "drop quiet: $dstport"
block drop in quick on $ext_if inet6 proto tcp from any to any port $quiet_block_tcp label "drop quiet: $dstport"
block drop in quick on $ext_if inet6 proto udp from any to any port $quiet_block_udp label "drop quiet: $dstport"
block drop in quick on $ext_if proto igmp all label "drop igmp"
block drop in quick on $ext_if proto 112 label "drop protocol 112"


# ALLOW: [out] * for root. Give root "GOD" privileges
pass out quick proto { tcp, udp } all user { root } $tcp_state label "core|out|root-$proto"
pass out quick inet6 proto { tcp, udp } all user { root } $tcp6_state label "core|out|root-$proto"


## > core
pass in on $jail_if inet proto tcp from $jail_net to $ip_core port { 53 } $tcp_state label "dns: jail -> [core]"
pass in on $jail_if inet6 proto tcp from $jail_net6 to $ip6_core port { 53 } $tcp6_state label "dns: jail -> [core]"
pass in on $jail_if inet proto udp from $jail_net to $ip_core port { 53 } $udp_state label "dns: jail -> [core]"
pass in on $jail_if inet6 proto udp from $jail_net6 to $ip6_core port { 53 } $udp6_state label "dns: jail -> [core]"
pass in log on $jail_if inet proto tcp from $ip_webproxy to $ip_core port $monit_ports $tcp_state label "monit: webproxy -> [core]"
pass in log on $jail_if inet6 proto tcp from $ip6_webproxy to $ip6_core port $monit_ports $tcp6_state label "monit: webproxy -> [core]"
pass out log proto tcp from $wan_ip4 to $dns_ips port { 53 } $tcp_state label "[core] -> external_dns"
pass out log inet6 proto tcp from $wan_ip6 to $dns_ips port { 53 } $tcp_state label "[core] -> external_dns"
pass out log proto udp from $wan_ip4 to $dns_ips port { 53 } $udp_state label "[core] -> external_dns"
pass out log inet6 proto udp from $wan_ip6 to $dns_ips port { 53 } $udp_state label "[core] -> external_dns"

# (allow full access to jails from core)
pass out proto tcp from $ip_core to $jail_net $tcp_state label "[core] -> jail_net"
pass out inet6 proto tcp from $ip6_core to $jail_net6 $tcp_state label "[core] -> jail_net"
pass out proto udp from $ip_core to $jail_net $udp_state label "[core] -> jail_net"
pass out inet6 proto udp from $ip6_core to $jail_net6 $udp_state label "[core] -> jail_net"


# (jail out requirements should also be enabled on the core, because of nat)
pass out log proto tcp from $wan_ip4 to any port $mail_ports $tcp_state label "mail: [core] -> any"
pass out log inet6 proto tcp from $wan_ip6 to any port $mail_ports $tcp_state label "mail: [core] -> any"
pass out proto tcp from $wan_ip4 to any port $jails_default_allowed_out_ports $tcp_state label "jail_allowed_out: [core] -> any"
pass out inet6 proto tcp from $wan_ip6 to any port $jails_default_allowed_out_ports $tcp_state label "jail_allowed_out: [core] -> any"


## > jail_net
pass out log proto tcp from $jail_net to $ip_core port { 53 } $tcp_state label "dns: [jail] -> core"
pass out log inet6 proto tcp from $jail_net6 to $ip6_core port { 53 } $tcp_state label "dns: [jail] -> core"
pass out log proto udp from $jail_net to $ip_core port { 53 } $udp_state label "dns: [jail] -> core"
pass out log inet6 proto udp from $jail_net6 to $ip6_core port { 53 } $udp_state label "dns: [jai] -> core"
pass out log proto tcp from $jail_net to any port $jails_default_allowed_out_ports $tcp_state label "jail_allowed_out: [jail] -> any"
pass out log inet6 proto tcp from $jail_net6 to any port $jails_default_allowed_out_ports $tcp_state label "jail_allowed_out: [jail] -> any"
## > db.local
pass in on $jail_if inet proto tcp from $ip_w0 to $ip_db port 3306 $tcp_state label "mysql: w0.local -> [db.local]"
pass in on $jail_if inet6 proto tcp from $ip6_w0 to $ip6_db port 3306 $tcp_state label "mysql: w0.local -> [db.local]"
pass in on $jail_if inet proto tcp from $ip_w0 to $ip_db port $redis_ports $tcp_state label "redis: w0.local -> [db.local]"
pass in on $jail_if inet6 proto tcp from $ip6_w0 to $ip6_db port $redis_ports $tcp_state label "redis: w0.local -> [db.local]"
pass in on $jail_if inet proto tcp from $ip_dns to $ip_db port 3306 $tcp_state label "mysql: dns.local -> [db.local]"
pass in on $jail_if inet6 proto tcp from $ip6_dns to $ip6_db port 3306 $tcp_state label "mysql: dns.local -> [db.local]"


## > webproxy.local
pass in log inet proto tcp from any to $ip_webproxy port $www_ports $tcp_state $webserver_sto label "www: any -> [webproxy.local]"
pass in log inet6 proto tcp from any to $ip6_webproxy port $www_ports $tcp6_state $webserver_sto label "www: any -> [webproxy.local]"
pass out log inet proto tcp from $ip_webproxy to $jail_net port $www_ports $tcp_state label "www: [webproxy.local] -> jail"
pass out log inet6 proto tcp from $ip6_webproxy to $jail_net6 port $www_ports $tcp_state label "www: [webproxy.local] -> jail"
pass out log inet proto tcp from $ip_webproxy to $jail_net port 3000:3999 $tcp_state label "3000:3999: [webproxy.local] -> jail"
pass out log inet6 proto tcp from $ip6_webproxy to $jail_net6 port 3000:3999 $tcp_state label "3000:3999: [webproxy.local] -> jail"
pass out log inet proto tcp from $ip_webproxy to $ip_core port $monit_ports $tcp_state label "monit: [webproxy.local] -> core"
pass out log inet6 proto tcp from $ip6_webproxy to $ip6_core port $monit_ports $tcp_state label "monit: [webproxy.local] -> core"


## > git.local
pass in log inet proto tcp from any to $ip_git port { 22 } $tcp_state $ssh_sto label "22: any -> [git.local]"
pass in log inet6 proto tcp from any to $ip6_git port { 22 } $tcp6_state $ssh_sto label "22: any -> [git.local]"


## > w0.local
pass in on $jail_if inet proto tcp from $ip_webproxy to $ip_w0 port $www_ports $tcp_state label "www: webproxy.local -> [w0.local]"
pass in on $jail_if inet6 proto tcp from $ip6_webproxy to $ip6_w0 port $www_ports $tcp_state  label "www: webproxy.local -> [w0.local]"
pass in on $jail_if inet proto tcp from $ip_webproxy to $ip_w0 port 3000:3999 $tcp_state label "3000:3999: webproxy.local -> [w0.local]"
pass in on $jail_if inet6 proto tcp from $ip6_webproxy to $ip6_w0 port 3000:3999 $tcp_state  label "3000:3999: webproxy.local -> [w0.local]"
pass in log inet proto tcp from any to $ip_w0 port { 1022 } $tcp_state $ssh_sto label "1022: any -> [w0.local]"
pass in log inet6 proto tcp from any to $ip6_w0 port { 1022 } $tcp6_state $ssh_sto label "1022: any -> [w0.local]"
pass out log inet proto tcp from $ip_w0 to $ip_db port 3306 $tcp_state label "mysql: [w0.local] -> db.local"
pass out log inet6 proto tcp from $ip6_w0 to $ip6_db port 3306 $tcp_state label "mysql: [w0.local] -> db.local"
pass out log inet proto tcp from $ip_w0 to $ip_db port $redis_ports $tcp_state label "redis: [w0.local] -> db.local"
pass out log inet6 proto tcp from $ip6_w0 to $ip6_db port $redis_ports $tcp_state label "redis: [w0.local] -> db.local"
pass out log inet proto tcp from $ip_w0 to $ip_mail port 25 $tcp_state label "mail: [w0.local] -> mail.local"
pass out log inet6 proto tcp from $ip6_w0 to $ip6_mail port 25 $tcp_state label "mail: [w0.local] -> mail.local"


## > mail.local
pass in log proto tcp from $ip_w0 to $ip_mail port 25 $tcp_state label "mail: w0.local -> [mail.local]"
pass in log inet6 proto tcp from $ip6_w0 to $ip6_mail port 25 $tcp_state label "mail: w0.local -> [mail.local]"
pass in log proto tcp from $ip_core to $ip_mail port 25 $tcp_state label "mail: core.local -> [mail.local]"
pass in log inet6 proto tcp from $ip6_core to $ip6_mail port 25 $tcp_state label "mail: core.local -> [mail.local]"


## > dns.local
pass in log inet proto tcp from any to $ip_dns port 53 $tcp_state label "53: any -> [dns.local]"
pass in log inet6 proto tcp from any to $ip6_dns port 53 $tcp6_state  label "53: any -> [dns.local]"
pass in log inet proto udp from any to $ip_dns port 53 $udp_state label "53: any -> [dns.local]"
pass in log inet6 proto udp from any to $ip6_dns port 53 $udp6_state  label "53: any -> [dns.local]"
pass in log inet proto tcp from <whitelist> to $ip_dns port 4275 $tcp_state label "4275: <whitelist> -> [dns.local]"
pass in log inet6 proto tcp from <whitelist> to $ip6_dns port 4275 $tcp6_state  label "4275: <whitelist> -> [dns.local]"
pass out log inet proto tcp from $ip_dns to $ip_db port 3306 $tcp_state label "mysql: [dns.local] -> db.local"
pass out log inet6 proto tcp from $ip6_dns to $ip6_db port 3306 $tcp_state label "mysql: [dns.local] -> db.local"


## > MISC Ping/PONG Ipv6 router stuff
pass inet proto icmp from $ext_if to any keep state
pass inet proto icmp from $jail_if to any keep state
pass out quick on { $ext_if, $jail_if } inet6 proto icmp6 all icmp6-type echoreq keep state label "core|out6|icmp6-echo"
pass out quick on { $ext_if, $jail_if }  inet6 proto icmp6 all icmp6-type {neighbradv, neighbrsol} label "core|out6|icmp6-bradv"
pass in quick on { $ext_if, $jail_if } inet6 proto icmp6 all icmp6-type {neighbradv, neighbrsol} label "core|in6|icmp6-bradv"
pass out quick on { $ext_if, $jail_if } inet6 proto icmp6 all icmp6-type routeradv label "core|out6|icmp6-router"
pass in quick on { $ext_if, $jail_if } inet6 proto icmp6 all icmp6-type routersol label "core|in6|icmp6-router"
pass in quick on { $ext_if, $jail_if } inet6 proto icmp6 all icmp6-type echoreq label "core|in6|icmp6-echo"
/etc/rc.conf
Code:
hostname="myserver.com"
ifconfig_vtnet0="DHCP"
ifconfig_vtnet0_ipv6="inet6 1:1:1:1::1/48"
ipv6_defaultrouter="2:2:2::2"

cloned_interfaces="lo1"
ipv4_addrs_lo1="10.0.2.100/24"
ifconfig_lo1_ipv6="fd::2:100 prefixlen 64"
local_unbound_enable="YES"
pf_enable="YES"
pflog_enable="YES"
pf_rules="/etc/pf.conf"
blacklist_enable="YES"
blacklistd_flags="-r -P /etc/blacklistd-sockets"
zfs_enable="YES"
sshd_enable="YES"
dumpdev="AUTO"
iocage_enable="YES"
blacklistd_enable="YES"
 
I've been running the test VM for few days and didn't hit any issues. I did adjust my /etc/pf.conf a bit to reflect more on what you're doing, just for the sake of test.

Are you still getting those crashes ? Would you be willing to test this on a fresh VM with the same setup (with no data) so you could share the coredump?

If you can't do that for whatever reason there's an option to recompile the kernel. Not for the sake of "let's see if compilation fixes something" but rather to decrease cc optimizations and have more verbose stack traces.
 
I've been running the test VM for few days and didn't hit any issues. I did adjust my /etc/pf.conf a bit to reflect more on what you're doing, just for the sake of test.

Are you still getting those crashes ? Would you be willing to test this on a fresh VM with the same setup (with no data) so you could share the coredump?

If you can't do that for whatever reason there's an option to recompile the kernel. Not for the sake of "let's see if compilation fixes something" but rather to decrease cc optimizations and have more verbose stack traces.
Thank you. I will try to build a custom kernel. Could you tell me what options to use for compilation. (What the make command I should use)
 
In /etc/make.conf you put COPTFLAGS=-O0 (oscar-zero) and compile the kernel. Remove it or hash it out once you are done. If you are using GENERIC kernel you don't need to change anything in the kernel config. If you have userspace and sources synced to the same version (freebsd-update does this) you don't need to compile world either.

While not needed I'd still do cp -rp /boot/kernel /boot/kernel.orig just to keep the original kernel at hand. You do get the kernel.old when installing new kernel but should you be focused on the issue and you do the kernel recompilation again you may loose the old dir. Pay attention to custom kernel modules (if you have any), you may need to recompile those too.

Note goal of this is just to see the otherwise optimized out variables just to shed more light on the issue. It doesn't help too much as we know you had #PF on 0 and then on some small address. It is easier to debug and see maybe something hidden (maybe issue was occurring in lower frames already) but it's not ideal way to go.

I'd rather focus on a reproducibility of the issue - either in new VM or setup of boot env and updating the other VM you have (you can instantly reboot back if you experience problems).
 
_martin I just installed a new kernel with the option you gave me.

Code:
# uname -a
FreeBSD myserver.com 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

# After build: uname -a
FreeBSD myserver.cmo 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0 releng/13.0-940681634: Mon Oct 18 06:47:07 CEST 2021     root@myserver.com:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

It still running, so that's great.
And now we wait ;-)

Btw.. I was wondering, is is possible the my DNS rules in pf.conf are causing my issue?
I'm mixing ip6 and ip4 in a list.

Code:
dns_ips = "{ 2001:4860:4860::8888 2001:4860:4860::8844 8.8.8.8 8.8.6.6 }"
pass out log proto tcp from $wan_ip4 to $dns_ips port { 53 } $tcp_state label "[core] -> external_dns"
pass out log inet6 proto tcp from $wan_ip6 to $dns_ips port { 53 } $tcp_state label "[core] -> external_dns"
 
the crash causing packet IS TCP but DNS is seldom used over TCP (mostly UDP)
can you look at the counters and see how much DNS traffic you get over TCP ?
also if you have logs you may be able to deduce which rule has/is causing the crash
 
_martin I just had to revert the kernel. It seemed to work at first. But then I noticed every few requests the webproxy couldn't reach the jails with the websites. (Contantly / randomly giving web proxy 502 bad gateway errors). Tried restarting firewall / nginx etc.. double checked running apps.. (Didn't get any core dump)
I don't have clue what's causing this ... (With the old kernel back it seems 'stable' again)
 
Side note: I first experienced this when I did an upgrade from 9.x to 10.x. My VPN clients got disconnects after a while (client conencted, was actively working and suddenly connection got killed). I found out it was PF, with the rdr pass or nat pass behaving as if connection state gets flushed. I had to have pass in filtering too to make this work. I think I did open PR for this but it never got fixed. To this date my rules are like these:
Code:
rdr on $ext_if proto tcp to $IP_EGRESS port $PORT_WWW -> $IP_JAIL_WEBSERVER               # \ pass in filtering
..
..
pass in quick proto tcp from any to {$IP_JAIL_WEBSERVER,$IP6_JAIL_WEBSERVER} port $PORT_WWW
This fixed my issue.

As covacat mentioned your crash seems to be related to handling tcp connection.

PF config seems normal. The only thing that did catch my attention was set optimization aggressive. Could the aggressive timer flush the connection when it should not? This is just a speculation on my side but it may be worth disabling this for sake of test. Especially if you keep getting crashes.
 
Back
Top