Gigabit network performance on 9-STABLE

Hi

I have a home lab router based on Intel DN2800MT dual core Atom board with 9-STABLE installed. It have two em NIC's (Intel(R) PRO/1000 Network Connection 7.3.8; one integrated, one optional)

Now I'm trying to tune network performance. GENERIC and custom kernels giving me no more then 620 MBit/s to the FreeBSD box and no more then 820 MBit/s from it on netperf TCP stream tests. UDP_STREAM gives ~820 Mbit/s in both directions. I'm reaching this maximums without polling, pf and other network services disabled (BPF is on for DHCPd). The same tests on a Windows box (Intel 1000EB) gives a stable ~950 MBit/s.

I'd tried a lot of solutions from FreeBSD forums, wiki and other sources (http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel) Once I reached a stable 780 Mbit/s on TCP tests, but it happened deep at night so I lost that configuration (didn't save sysctl.conf before reboot).

Could please somebody help me to figure out the bottleneck?

Also I noticed that server have a 100% interrupts load on rxd during tests. vmstat -i showing continuous growing of interrupt rates on RX and TX threads.
 
petruxa said:
I'm trying to tune network performance. GENERIC and custom kernels giving me no more then 620 MBit/s to the FreeBSD box and no more then 820 MBit/s from it on netperf TCP stream tests. UDP_STREAM gives ~820 Mbit/s in both directions. I'm reaching this maximums without polling, pf and other network services disabled (BPF is on for DHCPd). The same tests on a Windows box (Intel 1000EB) gives a stable ~950 MBit/s.

...

Could please somebody help me to figure out the bottleneck?
On a 10 Gbit/s Ethernet link (Intel X540-T1 into Dell PowerConnect 8024 switch), I can get essentially wire-speed performance on FreeBSD 8.4 (I'd expect 9-STABLE to be at least the same, given identical hardware):
Code:
(0:31) host1:/tmp# iperf -c host2
------------------------------------------------------------
Client connecting to host2, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[  3] local 204.141.35.226 port 55859 connected with 204.141.35.40 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.5 GBytes  9.89 Gbits/sec
That's with very slight tuning in /boot/loader.conf and /etc/sysctl.conf. Without any tuning, I "only" get around 8 Gbit/s. I don't know if that scales down proportionately on 1 Gbit/sec hardware or not. Here are the values I am using. Note that these values may not be appropriate for your (or any other) system, test on an unused system, results using something other than benchmarks/iperf may vary, etc.
Code:
(0:32) host1:/tmp# cat /boot/loader.conf 
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbop=262144
(0:33) host1:/tmp# cat /etc/sysctl.conf 
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

Also I noticed that server have a 100% interrupts load on rxd during tests. vmstat -i showing continuous growing of interrupt rates on RX and TX threads.
I'm not sure what rxd is. On a different FreeBSD 8.4 system with Intel PRO/1000 hardware (specifically an 82546EB chip) I only see a single interrupt counter for my em0 device. I'd expect to only see one interrupt for each packet sent or received. What sort of interrupts vs. packets are you seeing?
 
Terry, SirDice, thank you for replies and excuse me please for a long delay with answer.

I made more tests and can confirm that it was primary iperf/netperf "problem". With c:\iperf\iperf.exe -c <host> -w 4M -l 4M from Windows box to FreeBSD or other Windows box behind router (same interface, other VLAN) I'd got expected 950 Mbit/s with PF off and even the same rates with PF on and quick ... fastroute pf.conf rule first action. With more complex pf.conf I'm getting ~750 Ðœbit/s:

Code:
#options
set skip on lo0
#no scrub
#no ALTQ
#Nat
no nat inet from <no_nat> to <no_nat>
nat pass on $if_rinet inet from any to any tagged FOR_NAT -> ($if_rinet)

#Filtering
block in quick on ! $if_local inet from <shared_lans> to <private_lans>
pass in quick fastroute inet from <all_local> to <all_local>
pass out quick fastroute inet from <all_local> to <all_local>


pass out quick inet from self to any user >= 0

pass in quick on $if_rinet inet proto udp from <ipsec_peers> port 500 to ($if_rinet) port 500
pass in quick on $if_rinet inet proto { esp, ah, icmp } from <ipsec_peers> to ($if_rinet)
pass out quick on $if_rinet inet proto udp from ($if_rinet) port 500 to <ipsec_peers> port 500
pass out quick on $if_rinet inet proto { esp, ah, icmp } from ($if_rinet) to <ipsec_peers>

pass in quick on $if_rinet inet proto tcp from any to ($if_rinet) port 1723
pass in quick on $if_rinet inet proto gre from any to ($if_rinet)

block drop all

pass in quick inet from $lan_pptp to { <shared_local>, $lan_pptp }
pass in quick on ! $if_rinet inet from { <all_local>, $lan_pptp } to ! <no_nat> tag FOR_NAT

pass in quick on $if_v120 inet from $lan_v120 to { <ipsec_kz_lans>, <ipsec_peers> }
pass out quick on $if_v120 inet from { <ipsec_kz_lans>, <ipsec_peers> } to $lan_v120

Is it expected to "lose" 200 Mbit/s on a gigabit link (same interface, different VLAN's) with PF on a low-end (Atom) system? It's important because LAB networks are used to work with huge media files and additional 22 Mbytes/s will save notable time.

Also I'm confused with a random TCP performance and a different "back" traffic on Windows and FreeBSD boxes during iperf tests. For example:

- First test runs can reach 600-700 Mbit/s, later tests just after first two or three reaching 950 Mbit/s (without PF); Where should I look to reach stable maximum transfer rates?

- Test stream to FreeBSD box giving 950 Mbit/s outgoing and 20 Mbit/s incoming traffic while same tests to Windows boxes giving up to 6 Mbit/s of incoming "back" stream. Is it expected?

Thank you,
Peter.
 
SirDice said:
Have a look in tuning(7) and polling(4) for tips.

Yep, Ive been reading these manuals several time a day past week :) First version of my kernel was with the POLLING option enabled (just a "tail" from past configurations). I noticed it when tried to turn on ISR. With POLLING on, tests show no more then 720 Mbit/s.

Even more poor results were seen with
Code:
net.isr.bindthreads=1
Here are my current configurations:

sysctl.conf:
Code:
# $FreeBSD: stable/9/etc/sysctl.conf 112200 2003-03-13 18:43:50Z mux $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
# set to at least 16MB for 10GE hosts
kern.ipc.somaxconn=4096
kern.ipc.maxsockbuf=16777216
kern.ipc.shmmax=2147483648
kern.ipc.maxsockets=204800
kern.ipc.nmbclusters=262144
kern.sched.slice=1
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.maxvnodes=200000
kern.ipc.shm_use_phys=1

net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
#net.inet.tcp.hostcache.expire=1
net.inet.tcp.cc.algorithm=htcp
net.inet.ip.fastforwarding=0
kern.timecounter.hardware=TSC
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.interrupt=0
net.inet.tcp.tso=0
#net.inet.tcp.delayed_ack=0
net.inet.ip.intr_queue_maxlen=4096
net.inet.tcp.ecn.enable=0
dev.em.0.fc=0
dev.em.1.fc=0

loader.conf:
Code:
cc_htcp_load="YES"
zfs_load="YES"
if_vlan_load="YES"

kern.hz=1000
kern.timecounter.smp_tsc_adjust=1

vfs.root.mountfrom="zfs:zroot"
vfs.zfs.arc_max="2048M"
vfs.zfs.arc_min="2048M"
#vfs.zfs.vdev.cache.size="32M"
vfs.zfs.prefetch_disable="0"
vfs.zfs.txg.timeout="5"
kern.maxvnodes=250000
vfs.zfs.write_limit_override="1024M"
#vfs.zfs.write_limit_min="64M"
#vfs.zfs.nopwrite_enabled="0"
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"
#vfs.zfs.txg.synctime_ms="2000"


hw.em.rxd=4096
hw.em.txd=4096
hw.em.eee_setting=0
hw.em.rx_process_limit="-1"
hw.em.rx_abs_int_delay=1000
hw.em.tx_abs_int_delay=1000
hw.em.rx_int_delay=100
hw.em.tx_int_delay=100
#hw.em.enable_msix=1

net.isr.maxthreads=4
#net.isr.bindthreads=1
net.isr.defaultqlimit=1024
net.isr.maxqlimit=10240

#net.inet.tcp.tcbhashsize=524288
net.link.ifqmaxlen=1024

Kernel:
Code:
ident           GW
maxusers        256
#options         GEOM_BSD                # BSD disklabels
options         GEOM_LABEL              # Providers labelization.
#options         GEOM_MBR                # DOS/MBR partitioning
options         GEOM_PART_BSD           # BSD disklabel
options         GEOM_PART_GPT           # GPT partitioning
options         GEOM_PART_MBR           # MBR partitioning

options        SCHED_ULE

options         MROUTING

options         SMP                     # Symmetric MultiProcessor Kernel
options         MAXCPU=32
options         PREEMPTION
options         COMPAT_43
options         COMPAT_43TTY
options         COMPAT_FREEBSD4

# Enable FreeBSD5 compatibility syscalls
options         COMPAT_FREEBSD5

# Enable FreeBSD6 compatibility syscalls
options         COMPAT_FREEBSD6

# Enable FreeBSD7 compatibility syscalls
options         COMPAT_FREEBSD7
options         SYSVSHM
options         SYSVSEM
options         SYSVMSG
options         STACK
device          hwpmc                   # Driver (also a loadable module)
options         HWPMC_HOOKS             # Other necessary kernel hooks
options         INET                    #Internet communications protocols
options         ROUTETABLES=2           # max 16. 1 is back compatible.
options         IPSEC                   #IP security (requires device crypto)
options         IPSEC_NAT_T             #NAT-T support, UDP encap of ESP
#options         NETSMB                  #SMB/CIFS requester

# mchain library. It can be either loaded as KLD or compiled into kernel
options         LIBMCHAIN

# libalias library, performing NAT
#options         LIBALIAS

# flowtable cache
#options         FLOWTABLE
options         LIBICONV
options         ALTQ
options         ALTQ_CBQ        # Class Based Queueing
options         ALTQ_RED        # Random Early Detection
options         ALTQ_RIO        # RED In/Out
options         ALTQ_HFSC       # Hierarchical Packet Scheduler
options         ALTQ_CDNR       # Traffic conditioner
options         ALTQ_PRIQ       # Priority Queueing
options         NETGRAPH                # netgraph(4) system
device          loop
device          ether
device          wlan
device          vlan
options         IEEE80211_AMPDU_AGE     #age frames in AMPDU reorder q's
options         IEEE80211_SUPPORT_MESH  #enable 802.11s D3.0 support
options         IEEE80211_SUPPORT_TDMA  #enable TDMA support

device          bpf
device          netmap
options         TCP_SIGNATURE           #include support for RFC 2385

options         ZERO_COPY_SOCKETS
options         FFS                     #Fast filesystem
options         SOFTUPDATES

# Extended attributes allow additional data to be associated with files,
# and is used for ACLs, Capabilities, and MAC labels.
# See src/sys/ufs/ufs/README.extattr for more information.
options         UFS_EXTATTR
#options         UFS_EXTATTR_AUTOSTART

# Access Control List support for UFS filesystems.  The current ACL
# implementation requires extended attribute support, UFS_EXTATTR,
# for the underlying filesystem.
# See src/sys/ufs/ufs/README.acls for more information.
options         UFS_ACL

# Directory hashing improves the speed of operations on very large
# directories at the expense of some memory.
options         UFS_DIRHASH
options         QUOTA                   #enable disk quotas
options         SUIDDIR
options         VFS_AIO
device          random
device          mem

# The kernel symbol table device; /dev/ksyms
device          ksyms
options         _KPOSIX_PRIORITY_SCHEDULING
# p1003_1b_semaphores are very experimental,
# user should be ready to assist in debugging if problems arise.
options         P1003_1B_SEMAPHORES

# POSIX message queue
options         P1003_1B_MQUEUE
options         AUDIT
options         PROCDESC
#options         CAPABILITIES    # fine-grained rights on file descriptors
#options         CAPABILITY_MODE # sandboxes with no global namespace access
options         HZ=1000

# Enable support for the kernel PLL to use an external PPS signal,
# under supervision of [x]ntpd(8)
# More info in ntpd documentation: http://www.eecis.udel.edu/~ntp

options         PPS_SYNC

device          scbus           #base SCSI code
device          ch              #SCSI media changers
device          da              #SCSI direct access devices (aka disks)
device          sa              #SCSI tapes
device          cd              #SCSI CD-ROMs
device          ses             #SCSI Environmental Services (and SAF-TE)
device          pt              #SCSI processor
device          pass            #CAM passthrough driver
device          sg              #Linux SCSI passthrough
#device          ctl             #CAM Target Layer
#options         SES_ENABLE_PASSTHROUGH
device          pty             #BSD-style compatibility pseudo ttys
device          firmware        #firmware(9) support
device          sc
options         SC_TWOBUTTON_MOUSE
options         TEKEN_CONS25            # cons25-style terminal emulation
options         TEKEN_UTF8              # UTF-8 output handling
device          ahci
device          uart
device          em              # Intel Pro/1000 Gigabit Ethernet
options         LIBMBPOOL               #needed by patm, iatm
device          sound
device smbus
device iicbus
device iicbb
device iicsmb
device          intpm
device          ichsmb
device          smb
device          ppc
device          ppbus
device          lpt
device          pps
device          lpbb
device          pcfclock
#options         DEADLKRES
device          uhci
device          ehci
device          xhci
device          usb
device          uhid
device          ukbd
device          umass
device          ulpt
device          ums
device          crypto          # core crypto support
device          cryptodev       # /dev/crypto for access to h/w
options         DIRECTIO
#options         IPI_PREEMPTION
device          atpic                   # Optional legacy pic support
device          mptable                 # Optional MPSPEC mptable support
#options         MP_WATCHDOG
cpu             HAMMER                  # aka K8, aka Opteron & Athlon64
#options         DEVICE_POLLING
options         BPF_JITTER
#options         SDP
device          nvram           # Access to rtc cmos via /dev/nvram
device          speaker         #Play IBM BASIC-style noises out your speaker
device          isa
#options         AUTO_EOI_1
device          pci
device          agp
options         VESA
device          dpms            # DPMS suspend & resume via VESA BIOS
options         X86BIOS
device          psm
options         PSM_HOOKRESUME          #hook the system resume event, useful
options         PSM_RESETAFTERSUSPEND   #reset the device at the resume event
device          atkbdc
device          atkbd
device          vga
options         VGA_WIDTH90             # support 90 column modes
device          s3pci
device          acpi
device          drm             # DRM core module required by DRM drivers
device          i915drm         # Intel i830 through i915
device          ipmi
device          pbio
device          smbios
device          tpm
device          ichwd
device          coretemp
device          cpuctl
options         ENABLE_ALART            # Control alarm on Intel intpm driver
options         COMPAT_FREEBSD32
options         COMPAT_LINUX32
#options         LINPROCFS
#options         LINSYSFS
#options         PV_STATS
device          kbdmux
device          cpufreq
device          snd_hda
device          snd_ich
options         TCP_OFFLOAD             # TCP offload
#options         KSTACK_PAGES=4
options         DFLTPHYS=(128*1024)
options MAXPHYS=(1024*1024

I know that my configurations are chaotic - it's a test period. Could someone help me in configuring my system? Or just turn my head in the right direction (man? link? anything?)? :)

Thank you,

Peter.
 
Back
Top