PF FreeBSD/pf and SYN ACK flooding

Supermule · Jun 8, 2015

Hi.

First post here.

Seeing strange things on FreeBSD/PF that puzzles us. We cant get pfSense developers in on this so elevating things to FreeBSD forums.

This is tested on current and all earlier versions of pfSense, m0n0wall and the current forked version of pfSense (OPNsense).

We have tested bare metal, hypervisor (ESXi 4.1, 5.0, 5.5 and 6.0). All showing the same behaviour.

After the initial flooding of packets, then the traffic drops overall and the firewall begins to route packets again. There is no change in the flooding, but until the traffic drop, one core sees 100% and suddenly drops and then packets begin to flow again.

Thread on pfSense forums is here, https://forum.pfsense.org/index.php?topic=91856.0, but it's long and for some quite boring.

Pictures are attached.

Load on the hypervisor

The drop in load is when the drop in traffic occurs and it begins to route packets.

Any suggestions to why this is happening?

We have tested igb, em and vmxnet2 and 3 adapters. The best is em driver by far.

gkontos · Jun 8, 2015

The images appear to be broken links. I tried to have a look at the related thread but it is very long. Are you attacking the fw box directly or you perform an attack on a server behind?

It would be interested to see how are you performing this DoS and what do the logs of the firewall report during that time. I have never used pfsense before but I can simulate this on a FreeBSD 10.1-RELEASE running PF.

Supermule · Jun 8, 2015

Thanks for your kind reply.

I have attached some youtube links as well as the images.

Pfsense traffiic drop.

OPNSense traffic drop

Reboot server while being attacked

SYN flood pfsense stateless

SYN flood pfsense SYN Proxy

vmxnet3 test
https://www.youtube.com/watch?v=AUcgbt2lT9Y

There is a lot more, but everytime we see 100% core use, then packet loss occurs. As soon as 1 core lets go and the distributed load is there, then it begins to route packets again.

gkontos said:
The images appear to be broken links. I tried to have a look at the related thread but it is very long. Are you attacking the fw box directly or you perform an attack on a server behind?

It would be interested to see how are you performing this DoS and what do the logs of the firewall report during that time. I have never used pfsense before but I can simulate this on a FreeBSD 10.1-RELEASE running PF.

junovitch@ · Jun 9, 2015

Having all the NIC interrupts on the same core seems to be the biggest factor. It sounds like when stuff like r283959 hits for multiqueue em(4) it would be helpful. In the meantime, look for NICs where the driver supports multiple queues. The BSDRP tuning page suggest igb(4) is one of the NICs that can use multiple queues. There's also some more info there on just what commands you can run to look more closely at the issue.

Supermule · Jun 9, 2015

This is what I run in loader.conf:

Code:

autoboot_delay="3"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
if_igb_load="YES"
kern.hz="1000"
kern.ipc.nmbclusters="53168"
boot_serial="YES"
comconsole_speed="115200"
hw.usb.no_pf="1"

Hardware is IBM X3650 running Intel E1G42ET adapters and ESXi 4.1 U3.

Changing the hypervisor doesnt make a difference and the same issue running bare metal. So I stick to the 4.1 since its so nimble when running vcenter as well. (homesetup)

Supermule · Jun 9, 2015

Also it seems that it runs the em drivers despite having set the if_igb_load="YES" switch.

This is how top -HSP looks like during the event.

How to change to igb(4) driver and test again?

EDIT: VMware already runs igb and have E1000, vmxnet2 and 3 as options for the VNIC's.

Best option by far is the E1000 setting since vmxnet3 just goes down instantly while the E1000 is trying to handle the traffic and "burps".

Supermule · Jun 9, 2015

More testing is done on vmxnet3.

Instantly down until something begins to flow outbound. Ping is responding in the second the black line moves upwards.

Any suggestions??

Changed my loader.conf with this

Code:

autoboot_delay="3"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
if_igb_load="YES"
kern.hz=1000
boot_serial="YES"
comconsole_speed="115200"
hw.usb.no_pf="1"
aio_load="YES"
cc_htcp_load="YES"
net.inet.tcp.hostcache.cachelimit="0"
hw.igb.txd="2048"
hw.igb.rxd="2048"
hw.igb.rx_process_limit="-1"
hw.igb.enable_aim="1"
hw.igb.num_queues="0"
hw.igb.enable_msix="1"
kern.ipc.nmbclusters="492680"
net.inet.tcp.syncache.hashsize="1024"
net.inet.tcp.tcbhashsize="65536"
net.isr.bindthreads="0"
net.isr.dispatch="direct"
net.isr.maxthreads="1"

Now back to testing these settings in E1000

junovitch@ · Jun 10, 2015

You can't just change the driver. The driver is for the NIC that is in use. If VMWare doesn't provide an igb(4) compatible driver then you won't have the option to use that in FreeBSD. In that case, using igb(4) would be useful for physical hardware.

How are you creating the issue? A bunch of videos are pretty useless without the context of how to replicate the issue. Are you using a specific tool or command?

gkontos · Jun 10, 2015

Exactly, that's what I am asking here. I think that the same topic being in 2 different threads does not help a lot.

Supermule · Jun 10, 2015

It's a script you can run of any hardware.

I don't have the option of sending you what I have despite the will to do so. Having this out there, will make any FreeBSD server vulnerable and we don't want that.

OPNsense guys has agreed to get to the bottom of this later this week. So will report the findings and put it upstream.

junovitch said:
You can't just change the driver. The driver is for the NIC that is in use. If VMWare doesn't provide an igb(4) compatible driver then you won't have the option to use that in FreeBSD. In that case, using igb(4) would be useful for physical hardware.

How are you creating the issue? A bunch of videos are pretty useless without the context of how to replicate the issue. Are you using a specific tool or command?

Supermule · Jun 10, 2015

I run igb(4) on the host, but it's presented as em(4) to the guest OS.

junovitch said:
You can't just change the driver. The driver is for the NIC that is in use. If VMWare doesn't provide an igb(4) compatible driver then you won't have the option to use that in FreeBSD. In that case, using igb(4) would be useful for physical hardware.

How are you creating the issue? A bunch of videos are pretty useless without the context of how to replicate the issue. Are you using a specific tool or command?

gkontos · Jun 10, 2015

Supermule said:
It's a script you can run of any hardware.

I don't have the option of sending you what I have despite the will to do so. Having this out there, will make any FreeBSD server vulnerable and we don't want that.

Have you contacted the FreeBSD Security Team regarding this? If you are serious about this then you need to and sent them the script that makes every FreeBSD server vulnerable.

Supermule · Jun 10, 2015

It's not by default a security issue, but can be later on.

It's interruption of services run on FreeBSD since it more or less dies on impact.

And before I have something or somebody that is willing to help out then it's hard to take it upstream. Should be resolved later this week I hope.

SirDice · Jun 10, 2015

Supermule said:
It's interruption of services run on FreeBSD since it more or less dies on impact.

This makes it a security issue. Slowing to crawl is understandable but it should not die, i.e. as soon as the attack stops the service should continue to work.

Supermule · Jun 10, 2015

It does continue to work....sometimes. More or less depending on the length of the attack and the hardware dealing with it.

In some cases it stays offline until a hard reset of the hardware, but most of the time it begins to route as normal when done.

Mostly we saw small netgate aplliances dying and needed a reboot. VM's running on server grade hardware was more resilient to this. No change in response to attack though.

Services still goes offline.

SirDice said:
This makes it a security issue. Slowing to crawl is understandable but it should not die, i.e. as soon as the attack stops the service should continue to work.

gkontos · Jun 10, 2015

Supermule said:
It's not by default a security issue, but can be later on.

It's interruption of services run on FreeBSD since it more or less dies on impact.

And before I have something or somebody that is willing to help out then it's hard to take it upstream. Should be resolved later this week I hope.

This is my last reply here because honestly it looks like a waste of time.
You were asked to provide details regarding your testing method but we have not received anything yet.
You were asked to provide the script that causes all this but you replied that you are afraid that this might affect all FreeBSD systems.
You were told to contact with the security team if this such a concern and now you say that it is not a security issue.

Upstream is where you will get help and attention from developers.

Supermule · Jun 10, 2015

PF FreeBSD/pf and SYN ACK flooding

Supermule

gkontos

Supermule

Attachments

junovitch@

Supermule

Supermule

Attachments

Supermule

Attachments

junovitch@

gkontos

Supermule

Supermule

gkontos

Supermule

SirDice

Administrator

Supermule

gkontos

Supermule