Solved [Solved] Natd high CPU usage

Hello,
Im using FreeBSD 8.0 + ipfw() + natd() as a gateway for small lan (few hundreds users). During normal operation CPU usage by natd() varies between few % CPU up to 20-30 % during peak hours. However from time to time something strange happens. natd() is constantly increasing CPU usage up to 80-90%, then FreeBSD starts to drop packets. This situation last for a few seconds, natd() CPU usage starts to drop to normal values, everything works again, and after some time this process repeats itself.

natd.conf:
Code:
redirect_port tcp 172.17.242.255:4365-4366 4365-4366
redirect_port udp 172.17.242.255:4365-4366 4365-4366
redirect_port tcp 10.10.10.3:22 7728
redirect_port tcp 10.10.10.3:443 7729
unregistered_only
interface em0

ipfw():
Code:
ipfw -q -f flush

# Set rules command prefix
cmd="ipfw -q add"
skip="skipto 800"
pif="em0"     # public interface name of NIC
              # facing the public Internet
iif="em2"
#
$cmd 001 deny all from 192.168.0.0/16 to any in via $pif
$cmd 002 deny all from 172.16.0.0/12 to any in via $pif
$cmd 003 deny all from 10.0.0.0/8 to any in via $pif
$cmd 004 deny all from 127.0.0.0/8 to any in via $pif
$cmd 005 deny all from 0.0.0.0/8 to any in via $pif
$cmd 006 deny all from 169.254.0.0/16 to any in via $pif
$cmd 007 deny all from 192.0.2.0/24 to any in via $pif
$cmd 008 deny all from 204.152.64.0/23 to any in via $pif
$cmd 009 deny all from 224.0.0.0/3 to any in via $pif
#Ident
$cmd 010 deny tcp from any to any 113 in via $pif
#Netbios
$cmd 020 deny all from any to any 137 in via $pif
$cmd 021 deny all from any to any 138 in via $pif
$cmd 022 deny all from any to any 139 in via $pif
$cmd 023 deny all from any to any 81 in via $pif
#NAT
$cmd 030 divert natd all from any to any via $pif



I tried to investigate the issue and observed this (example calculations from netstat()):
-natd normal CPU usage - about 10k packets/s
-natd increased CPU usage - sudden increase to about 90k packets/sec

So it probably has something to do with packet numbers. Whats bothers me is that I managed to replicate this issue by doing nothing harmful - when I log to the server via ssh from outside world, start mc, natd() starts consuming more and more CPU, same happens during even small file transfers via scp(). Just starting mc from outside network increases packets per second from lets say 12k up to 90k. So my question is this: maybe someone has some experience with similar problem and know where the problem may be ? Any suggestions would be really appreciated.
 
More info

After some tests to investigate the issue i observed this:

I connected another computer via switch with address from the same network in which WAN ip of the FreeBSD gateway is. Both are public routable addresses. When i try to connect to FreeBsd box via ssh and run mc, or connect via scp and download some file, natd usage starts to increase rapidly. If i do the same operations from LAN everything works perfect. It seems that traffic from WAN side destined directly to WAN ip of the gateway means that natd enters some kind of loop and starts to increase CPU usage. One thing i can't understand is why packets originating from public ip address destined to public ip Wan interface are processed by natd (even though i have unregistered_only statement) and result in higher cpu usage.

Any ideas what may be wrong? Maybe i should change the line:
Code:
$cmd 030 divert natd all from any to any via $pif
to:
Code:
$cmd 030 divert natd all from 172.16.0.0/16 to any via $pif
Will this work ?
 
Solution found

It seems lots of people are running into performance issues with natd on FreeBSD and OS X.

I ran into this issue using ipfw + natd on the head node of a small HPC cluster. Network throughput was fine, but remote interactive use of the head node was terrible. For example, using a text editor on the head node over ssh from outside the cluster would often result in response times of over 1 second to individual keystrokes. Heavy output to the terminal was very choppy. At the same time, natd would consume a significant amount of CPU.

I was able to resolve it by simply improving my ipfw rule set. The key is to exempt as much traffic as possible from NAT. The default rule based on the instructions in the FreeBSD Handbook:

Code:
    divert 8668 all from any to any via $pif
runs all traffic in and out of the head node through NAT, when in reality only traffic from or to other local hosts needs to be. Below is a portion of the rule set that fixed the problem:
Code:
#!/bin/sh

cmd="ipfw -q add"
pif=bce0
lif=bce1

ipfw -q -f flush

#############################################################################
# Exempt local traffic from NAT

$cmd 00010 allow all from any to any via lo0
$cmd 00020 allow all from any to any via $lif

#############################################################################
# Bad: This would prevent return packets from being diverted through NAT
# to local machines.  Many of them in reality don't need to be altered,
# but we need NATD to determine which ones do.
# $cmd 00046 allow all from not me to me via $pif

#############################################################################
# Exempt traffic originating on this host from NAT
$cmd 00047 allow all from me to not me via $pif

#############################################################################
# Divert outgoing packets from local machines and all incoming packets
# from the public interface through NAT

# This would be good enough, assuming all the rules above exist
# $cmd 00050 divert 8668 ip4 from any to any via $pif

# These rules limit the use of natd regardless of preceding rules
$cmd 00050 divert 8668 all from 192.168.0.0/24 to any via $pif
$cmd 00050 divert 8668 all from any to me via $pif
Since setting these rules, natd CPU use tops out around 27% of one core during a long file transfer.

CPU: AMD Opteron(tm) Processor 4170 HE (2100.04-MHz K8-class CPU)
The network connections are all gigabit.

This should probably be mentioned in the FreeBSD Handbook section on natd. Maybe I'll submit a patch one of these days...

Regards,

Jason
 
This is an old thread, but it's also the first result from google, so I thought it might help others to comment further.

The above poster definitely has one workable solution, but a slight modification of the approach would be to first minimize the "divert" lines to match as little as possible, and then use "allow" to exempt whatever is left, which is primarily services handled by the server itself.

As noted above, the default
Code:
divert 8668 all from any to any via $pif

line (which shows up as line 50 on my system) is the main culprit behind this problem (though it is a reasonable FreeBSD default to make nat work without any other knowledge of the network(s) the server is on).

I used the following script to fix the problem:

Code:
#!/bin/sh

pif=em0
tcp_serv="22,25,53,80,443"

cmd="ipfw -q add"

ipfw -q delete 50

$cmd 00001 allow tcp from any to me $tcp_serv in via $pif
$cmd 00050 divert natd ip from any to me in via $pif
$cmd 00058 divert 8668 all from 10.8.0.0/24 to not me out via $pif
$cmd 00059 divert 8668 all from 10.9.0.0/24 to not me out via $pif

Of course, rules 58 and 59 should be modified to match the LAN subnets you want to nat.

Rule 1 whitelists all tcp services handled by the server itself (natd doesn't need to see these) inbound via $pif, such as ssh, smtp, etc.

Of the remaining traffic inbound traffic via $pif destined for me, this is now diverted to natd via line 50 (this is necessary to nat return traffic). This doesn't touch any routed traffic that is only transiting the server.

Line 58 diverts outbound traffic to natd. It only diverts traffic that is from the appropriate source IP, and is using $pif outbound. This for example will not touch internal to internal traffic, nor external to external traffic. Line 59 simply nats a 2nd subnet.

You could further augment this with some udp services, if for example you had an openvpn server/etc.
 
Re: Natd high CPU usage

Thanks @ggong, your rules fixed my high CPU on natd.

What's strange though is that I first switched my ipfilter/ipnat to ipfw/natd on my home server (which is also my router). I had high CPU on natd. I rebooted, I turned all my LAN devices off, but the CPU was still quite high. Then I simplified my rules to those suggested in the natd man page. Since then it almost never goes above 0% CPU, dramatic difference.

However, on my office server, I copied the exact same rules, but I get the high CPU on one core. It sits at around 80-90%. Even at night when there is nobody on the LAN and low activity on the mail/web servers etc.

I followed your rules to reduce the traffic going through natd, and bingo, dropped down to 0% CPU. I wonder if it's the removing of unwanted ICMP traffic through natd. Perhaps it's because I'm running DHCP, even though that runs on the LAN interface.

Here are my new rules, which resolve the high CPU issue:
Code:
#!/bin/sh
# Flush out the list before we begin.
ipfw -q -f flush

# Set rules command prefix
cmd="ipfw -q add"

wan="igb0"     # interface name of NIC attached to WAN (Internet)
lan="igb1"     # interface names of NIC attached to LAN
looopback="lo0"   # local loopback device

$cmd pass all from any to any via $loopback
$cmd pass all from any to any via $lan
$cmd divert natd ip from any to me in via $wan
$cmd divert 8668 all from 192.168.10.0/24 to not me out via $wan
$cmd pass all from any to any
 
Last edited by a moderator:
Back
Top