PF FreeBSD/pf and SYN ACK flooding

Hi.

First post here.

Seeing strange things on FreeBSD/PF that puzzles us. We cant get pfSense developers in on this so elevating things to FreeBSD forums.

This is tested on current and all earlier versions of pfSense, m0n0wall and the current forked version of pfSense (OPNsense).

We have tested bare metal, hypervisor (ESXi 4.1, 5.0, 5.5 and 6.0). All showing the same behaviour.

After the initial flooding of packets, then the traffic drops overall and the firewall begins to route packets again. There is no change in the flooding, but until the traffic drop, one core sees 100% and suddenly drops and then packets begin to flow again.

Thread on pfSense forums is here, https://forum.pfsense.org/index.php?topic=91856.0, but it's long and for some quite boring.

Pictures are attached.

index.php


index.php


index.php


index.php


Load on the hypervisor

index.php


The drop in load is when the drop in traffic occurs and it begins to route packets.

Any suggestions to why this is happening?

We have tested igb, em and vmxnet2 and 3 adapters. The best is em driver by far.
 
The images appear to be broken links. I tried to have a look at the related thread but it is very long. Are you attacking the fw box directly or you perform an attack on a server behind?

It would be interested to see how are you performing this DoS and what do the logs of the firewall report during that time. I have never used pfsense before but I can simulate this on a FreeBSD 10.1-RELEASE running PF.
 
Thanks for your kind reply.

I have attached some youtube links as well as the images.

Pfsense traffiic drop.

OPNSense traffic drop

Reboot server while being attacked

SYN flood pfsense stateless

SYN flood pfsense SYN Proxy

vmxnet3 test
https://www.youtube.com/watch?v=AUcgbt2lT9Y

There is a lot more, but everytime we see 100% core use, then packet loss occurs. As soon as 1 core lets go and the distributed load is there, then it begins to route packets again.

The images appear to be broken links. I tried to have a look at the related thread but it is very long. Are you attacking the fw box directly or you perform an attack on a server behind?

It would be interested to see how are you performing this DoS and what do the logs of the firewall report during that time. I have never used pfsense before but I can simulate this on a FreeBSD 10.1-RELEASE running PF.
 

Attachments

  • top -HSP before reload of -promisc.PNG
    top -HSP before reload of -promisc.PNG
    85.9 KB · Views: 443
  • top -HSP after reload of -promisc.PNG
    top -HSP after reload of -promisc.PNG
    91.6 KB · Views: 466
  • traffic_drop.PNG
    traffic_drop.PNG
    19.1 KB · Views: 299
  • traffic_drop2.PNG
    traffic_drop2.PNG
    46.5 KB · Views: 285
  • traffic_drop3.PNG
    traffic_drop3.PNG
    24.1 KB · Views: 261
Having all the NIC interrupts on the same core seems to be the biggest factor. It sounds like when stuff like r283959 hits for multiqueue em(4) it would be helpful. In the meantime, look for NICs where the driver supports multiple queues. The BSDRP tuning page suggest igb(4) is one of the NICs that can use multiple queues. There's also some more info there on just what commands you can run to look more closely at the issue.
 
This is what I run in loader.conf:
Code:
autoboot_delay="3"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
if_igb_load="YES"
kern.hz="1000"
kern.ipc.nmbclusters="53168"
boot_serial="YES"
comconsole_speed="115200"
hw.usb.no_pf="1"
Hardware is IBM X3650 running Intel E1G42ET adapters and ESXi 4.1 U3.

Changing the hypervisor doesnt make a difference and the same issue running bare metal. So I stick to the 4.1 since its so nimble when running vcenter as well. (homesetup)
 
Also it seems that it runs the em drivers despite having set the if_igb_load="YES" switch.

This is how top -HSP looks like during the event.

How to change to igb(4) driver and test again?

EDIT: VMware already runs igb and have E1000, vmxnet2 and 3 as options for the VNIC's.

Best option by far is the E1000 setting since vmxnet3 just goes down instantly while the E1000 is trying to handle the traffic and "burps".
 

Attachments

  • kernel_EM0_taskq.PNG
    kernel_EM0_taskq.PNG
    45.3 KB · Views: 387
  • igb.PNG
    igb.PNG
    6.7 KB · Views: 331
More testing is done on vmxnet3.

Instantly down until something begins to flow outbound. Ping is responding in the second the black line moves upwards.

Any suggestions??

Changed my loader.conf with this

Code:
autoboot_delay="3"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
if_igb_load="YES"
kern.hz=1000
boot_serial="YES"
comconsole_speed="115200"
hw.usb.no_pf="1"
aio_load="YES"
cc_htcp_load="YES"
net.inet.tcp.hostcache.cachelimit="0"
hw.igb.txd="2048"
hw.igb.rxd="2048"
hw.igb.rx_process_limit="-1"
hw.igb.enable_aim="1"
hw.igb.num_queues="0"
hw.igb.enable_msix="1"
kern.ipc.nmbclusters="492680"
net.inet.tcp.syncache.hashsize="1024"
net.inet.tcp.tcbhashsize="65536"
net.isr.bindthreads="0"
net.isr.dispatch="direct"
net.isr.maxthreads="1"

Now back to testing these settings in E1000
 

Attachments

  • vmxnet3_reply.png
    vmxnet3_reply.png
    90 KB · Views: 288
You can't just change the driver. The driver is for the NIC that is in use. If VMWare doesn't provide an igb(4) compatible driver then you won't have the option to use that in FreeBSD. In that case, using igb(4) would be useful for physical hardware.

How are you creating the issue? A bunch of videos are pretty useless without the context of how to replicate the issue. Are you using a specific tool or command?
 
It's a script you can run of any hardware.

I don't have the option of sending you what I have despite the will to do so. Having this out there, will make any FreeBSD server vulnerable and we don't want that.

OPNsense guys has agreed to get to the bottom of this later this week. So will report the findings and put it upstream.

You can't just change the driver. The driver is for the NIC that is in use. If VMWare doesn't provide an igb(4) compatible driver then you won't have the option to use that in FreeBSD. In that case, using igb(4) would be useful for physical hardware.

How are you creating the issue? A bunch of videos are pretty useless without the context of how to replicate the issue. Are you using a specific tool or command?
 
I run igb(4) on the host, but it's presented as em(4) to the guest OS.

You can't just change the driver. The driver is for the NIC that is in use. If VMWare doesn't provide an igb(4) compatible driver then you won't have the option to use that in FreeBSD. In that case, using igb(4) would be useful for physical hardware.

How are you creating the issue? A bunch of videos are pretty useless without the context of how to replicate the issue. Are you using a specific tool or command?
 
It's a script you can run of any hardware.

I don't have the option of sending you what I have despite the will to do so. Having this out there, will make any FreeBSD server vulnerable and we don't want that.

Have you contacted the FreeBSD Security Team regarding this? If you are serious about this then you need to and sent them the script that makes every FreeBSD server vulnerable.
 
It's not by default a security issue, but can be later on.

It's interruption of services run on FreeBSD since it more or less dies on impact.

And before I have something or somebody that is willing to help out then it's hard to take it upstream. Should be resolved later this week I hope.
 
It's interruption of services run on FreeBSD since it more or less dies on impact.
This makes it a security issue. Slowing to crawl is understandable but it should not die, i.e. as soon as the attack stops the service should continue to work.
 
It does continue to work....sometimes. More or less depending on the length of the attack and the hardware dealing with it.

In some cases it stays offline until a hard reset of the hardware, but most of the time it begins to route as normal when done.

Mostly we saw small netgate aplliances dying and needed a reboot. VM's running on server grade hardware was more resilient to this. No change in response to attack though.

Services still goes offline.

This makes it a security issue. Slowing to crawl is understandable but it should not die, i.e. as soon as the attack stops the service should continue to work.
 
It's not by default a security issue, but can be later on.

It's interruption of services run on FreeBSD since it more or less dies on impact.

And before I have something or somebody that is willing to help out then it's hard to take it upstream. Should be resolved later this week I hope.

This is my last reply here because honestly it looks like a waste of time.
You were asked to provide details regarding your testing method but we have not received anything yet.
You were asked to provide the script that causes all this but you replied that you are afraid that this might affect all FreeBSD systems.
You were told to contact with the security team if this such a concern and now you say that it is not a security issue.

Upstream is where you will get help and attention from developers.
 
Back
Top