How to get more throughput from/to FreeBSD servers?

Hi all,

I'm a ZFS user for more than 4 years and I'm so glad to take advantage of this excellent filesystem on FreeBSD. Since then, I built several servers. Small ones until as big as 60 TB servers and all of them as stable as a rock. But, there is an issue that I never could solve, which is: How to get more performance on writing/reading data to/from these servers? I really would like to know someone here who have a FreeBSD server running on technologies faster than gigabit to get real good throughput to transfer data. I know there are some options like link aggregation, but this only gives me load balancing and fault tolerance and not more throughput worth these big servers.

I've tested several times 10 Gb cards over SMB protocol in a point-to-point environment and the maximum speed I could reach was like 90 MB/s, due to the protocol I guess. I'm the IT manager in some facilities, and the need for more performance on these servers is really big. Data transfer as big as 8 TB from FreeBSD/ZFS servers to other volumes over gigabit is really a pain.

Any comments and suggestions here are welcome.

Thanks in advance,

Danilo
 
I have just one FreeBSD 9.1 system with a 10 Gb/s NIC and it saturated the link with 11 Mac Pros as AFP clients without further tuning. All I did was start a copy of a different non overlapping directory with tar cf /dev/null on the Mac Pros. All directories contained mostly files from a few dozen MiB to a few GiB.

The pool consists of four RAID-Z2 VDEVs with 8 disks per VDEV a mirrored SSD ZIL and two L2ARC SSDs and reaches about 1.6 GiB/s with 32 parallel dd bs=1m if=/dev/zero processes (with compression disabled and dedup disabled).
 
Crest said:
I have just one FreeBSD 9.1 system with a 10Gb/s NIC and it saturated the link with 11 Mac Pros as AFP clients without further tuning. All I did was start a copy of a different non overlapping directory with tar cf /dev/null on the Mac Pros. All directories contained mostly files from a few dozen MiB to a few GiB. The pool consists of four RAID-Z2 VDEVs with 8 disks per VDEV a mirrored SSD ZIL and two L2ARC SSDs and reaches about 1,6GiB/s with 32 parallel dd bs=1m if=/dev/zero processes (with compression disabled and dedup disabled).

Hi @Crest,

This is really fast! This was over SMB?

On my previous tests with 10 Gb cards, I could reach something like 300 MB/s with four simultaneous copies from four different points. But my issue is about how to do get higher throughput from one client only, for instance, to transfer a huge bunch of data from the server 1 to server 2.

Thanks.
 
Last edited by a moderator:
Some resources to check for ideas:

New pluggable congestion algorithms: http://www.freebsd.org/releases/9.0R/relnotes.html (search for congestion)

A post here in the forum, maybe you have already seen this: ZFS / Samba: performance issue
An older thread, but mentions some sysctl variables, too: [Solved] FreeBSD 8.0: Samba33 read slow and write fast

You should investigate a little bit more about the single parameters mentioned in the posts before applying these to production systems. You could also try to set up NFS to see if the problem is CIFS/Samba-specific or network-specific.

Maybe this helps you to get started.
 
gqgunhed said:
Some resources to check for ideas:

New pluggable congestion algorithms: http://www.freebsd.org/releases/9.0R/relnotes.html (search for congestion)
A post here in the forum, maybe you have already seen this: ZFS / Samba: performance issue
An older thread, but mentions some sysctl variables, too: [Solved] FreeBSD 8.0: Samba33 read slow and write fast

You should investigate a little bit more about the single parameters mentioned in the posts before applying these to production systems.
You could also try to setup NFS to see if the problem is CIFS/Samba-specific or network-specific.

Maybe this helps you to get started.

Hmm, this sounds interesting. I'll check these topics a.s.a.p. What about 10 Gb? Someone here got high speeds in a point-to-point connection?

Thanks
 
If you need more speed between two points, you only have two options:
  1. widen the path between them (aka move to 10 Gbps Ethernet)
  2. add more paths (aka link aggregation)

The first option will be the "simplest" in that it's a plug-n-play replacement of NICs (or just adding a new NIC), no special tuning required to increase throughput. You only need to tune it if you want to eke out the last few percent. ;)

The second option most likely won't work in your situation. There are some esoteric ways to make link aggregation work between two stations, but it may not be worth the effort.
 
Link aggregation protocols aren't designed to perfectly balance a small number of flows. In the best case they distribute flows with a descent hash function over all relevant header fields. Their is no load balancing between the interfaces in a group. So try again with 10 to 100 connections.
 
phoenix said:
If you need more speed between two points, you only have two options:
  1. widen the path between them (aka move to 10 Gbps Ethernet)
  2. add more paths (aka link aggregation)

The first option will be the "simplest" in that it's a plug-n-play replacement of NICs (or just adding a new NIC), no special tuning required to increase throughput. You only need to tune it if you want to eke out the last few percent. ;)

The second option most likely won't work in your situation. There are some esoteric ways to make link aggregation work between two stations, but it may not be worth the effort.

Hi @phoenix,

Thanks for the reply. This is exactly the answer I'm expecting. I want to be sure about the alternatives. But, I did some tests with 10 Gb cards in these servers before and 100 MB/s peaks was the maximum I could reach copying files through SMB and rsync over SSH. That's why I opened this topic, if 10 Gb cards can't reach reasonable speeds, what else can I do? Is this only a protocol issue?

Thanks.
 
Last edited by a moderator:
Another solution would be to try MPTCP (MultiPath TCP): http://http://caia.swin.edu.au/urp/newtcp/mptcp/

It has the advantages of being cheaper than 10 Gbps while providing nice throughput in the form of TCP aggregation. You also don't need a special switch (no need for LACP).

The disadvantage is that you need one IP per physical port you aggregate. Also the support in FreeBSD is still in alpha stage but it seems really promising.
 
Mussolini said:
Hi @phoenix,

Thanks for the reply. This is exactly the answer I'm expecting. I want to be sure about the alternatives. But, I did some tests with 10 Gb cards in these servers before and 100 MB/s peaks was the maximum I could reach copying files through SMB and rsync over SSH. That's why I opened this topic, if 10 Gb cards can't reach reasonable speeds, what else can I do? Is this only a protocol issue?

Enable the HPN patches for SSH, and consider using the None cipher (if you own and trust the network between the servers). SSH is CPU-limited if you have encryption enabled. Using the None cipher removes that bottleneck, and makes it disk I/O and/or network-limited.

For example, ZFS send via SSH across a gigabit link with default SSH settings wouldn't go above 300-odd Mbps. Disabling compression pushed it to around 400-odd Mbps. Enabling HPN with HPNBufferSize=16384 pushed it above 500 Mbps. But it wasn't until we enabled the None cipher that we were able to saturate the link (920+ Mbps).

As for Samba, there's a lot of tuning that needs to be done to make it play nicely with ZFS. I haven't looked into it, other than seeing lots of threads about it on the various FreeBSD mailing lists.
 
Last edited by a moderator:
FlorianMillet said:
Another solution would be to try MPTCP (MultiPath TCP): http://http://caia.swin.edu.au/urp/newtcp/mptcp/

It has the advantages of being cheaper than 10 Gbps while providing nice throughput in the form of TCP aggregation. You also don't need a special switch (no need for LACP).

The disadvantage is that you need one IP per physical port you aggregate. Also the support in FreeBSD is still in alpha stage but it seems really promising.

Hi Florian,

I have read a little about that, and this sounds really interesting. As soon as I get the environment for tests, I'll do it.

Thanks for the tip!
 
phoenix said:
Enable the HPN patches for SSH, and consider using the None cipher (if you own and trust the network between the servers). SSH is CPU-limited if you have encryption enabled. Using the None cipher removes that bottleneck, and makes it disk I/O and/or network-limited.

For example, ZFS send via SSH across a gigabit link with default SSH settings wouldn't go above 300-odd Mbps. Disabling compression pushed it to around 400-odd Mbps. Enabling HPN with HPNBufferSize=16384 pushed it above 500 Mbps. But it wasn't until we enabled the None cipher that we were able to saturate the link (920+ Mbps).

As for Samba, there's a lot of tuning that needs to be done to make it play nicely with ZFS. I haven't looked into it, other than seeing lots of threads about it on the various FreeBSD mailing lists.

I had a different experience. A few days ago, I used some spare parts to build a backup system. I was able to transfer around 600 GB of data using zfs send and zfs recv over SSH in under 3 hours. Using sysutils/conky, I observed that the transfer rate went as far as 80 MBps, i.e., around 640 Mbps.
 
Hi all,

FYI, I did some performance tests today using my 10 Gb cards with iperf. Running the iperf server on the server side and the iperf client in a MacPro. I reached around 620 MB/s in a point-to-point connection. I really would love to get this speed copying files...
 
Mussolini said:
I did some performance tests today using my 10 Gb cards with iperf. Running the iperf server on the server side and the iperf client in a MacPro. I reached around 620 MB/s in a point-to-point connection. I really would love to get this speed copying files...
With an inexpensive ($800) Netgear XS708E 8-port 10GB switch and some $300 Intel X540-T1 cards, I was able to use benchmarks/iperf to get 9.89 Gbits/sec with only three threads:
Code:
# iperf -c rz1 -P 3
------------------------------------------------------------
Client connecting to rz1, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[  5] local 10.20.30.42 port 59687 connected with 10.20.30.40 port 5001
[  3] local 10.20.30.42 port 19218 connected with 10.20.30.40 port 5001
[  4] local 10.20.30.42 port 44371 connected with 10.20.30.40 port 5001
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.2 sec  4.10 GBytes  3.46 Gbits/sec
[  3]  0.0-10.2 sec  3.89 GBytes  3.28 Gbits/sec
[  4]  0.0-10.2 sec  3.75 GBytes  3.16 Gbits/sec
[SUM]  0.0-10.2 sec  11.7 GBytes  9.89 Gbits/sec

Two threads provided an acceptable 9.02 Gbits/sec, but single-thread falls to 4.68 Gbits/sec.

It is quite instructive to play with window size, IPv4 vs. IPv6, TCP vs. UDP, etc. as there's an amazing change in performance (a penalty) when using some of the non-default options.
 
This is awesome! And it's really good to know about this switch. Thanks for the post.

So, what's the purpose of this 10 Gb network? File sharing? If so, what kind of protocol you are going to use?
 
Terry_Kennedy said:
inexpensive ($800) Netgear XS708E 8-port 10GB switch

What's your take on the stated requirement to configure this switch from a Windows machine using a proprietary configuration program? I'm assuming that if Netgear doesn't fix this then somebody else will now that the low end of the 10 GbE market is starting to open up.
 
Mussolini said:
So, what's the purpose of this 10 Gb network? File sharing? If so, what kind of protocol you are going to use?
Mostly it's just for testing at this point. My RAIDzilla II servers can do > 500 MByte/sec locally, but are limited to about 110 MByte/sec over Gigabit Ethernet. So I figured I'd do some testing with 10 Gigabit to see what sort of performance I can get via zfs send, NFS, and SAMBA.
 
Uniballer said:
What's your take on the stated requirement to configure this switch from a Windows machine using a proprietary configuration program?
It is rather annoying and stunningly insecure. I told Netgear about this when I tested prototypes of the first switch that was going to use this interface. For some reason they continue to ship outdated versions of Adobe Air and WinPCAP bundled into their installer. I strongly encouraged them to have their installer fetch the latest Air directly from Adobe, since it gets updated due to security problems practically every month.

If you don't need to "manage" the switch, you can use the switch without the utility at all, or perform a throwaway install of the utility on a scratch system and set the IP address and password to something which will prevent anyone from accesssing it remotely.

I'm assuming that if Netgear doesn't fix this then somebody else will now that the low end of the 10 GbE market is starting to open up.
Hopefully. However, this type (semi-managed) of switch seems to be doing really well in the marketplace. I would have been willing to pay twice as much for a switch with SNMP. However, it seems that all of the switches with more advanced features have many more ports (24 seems to be common). I think this is because the manufacturers are paying a per-switch royalty for operating software like FASTPATH (that's the most common one used on switches with "Cisco-like" command line interfaces) which leads to switches with lots of ports to justify the expense.
 
To those who were able to get great throughput from 10 Gb cards, can you post the card model and specs?

Has anyone used the same card below and was able to make it work? If so, how did you do it? Appreciate some pointers here for kernel settings, drivers etc. We don't have a 10 Gb switch and are connecting it machine-to-machine.

Code:
ix0@pci0:8:0:0: class=0x020000 card=0xa01f8086 chip=0x10ec8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82598EB 10-Gigabit AT CX4 Network Connection'
    class      = network
    subclass   = ethernet
ix1@pci0:8:0:1: class=0x020000 card=0xa01f8086 chip=0x10ec8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82598EB 10-Gigabit AT CX4 Network Connection'
    class      = network
    subclass   = ethernet

Code:
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.3.11> port 0x6000-0x601f mem 0xd9480000-0xd949ffff,0xd9400000-0xd943ffff,0xd94c0000-0xd94c3fff irq 17 at device 0.0 on pci8
ix0: Using MSIX interrupts with 9 vectors
ix0: RX Descriptors exceed system mbuf max, using default instead!
ix0: Ethernet address: 00:1b:21:53:e0:f1
ix0: PCI Express Bus: Speed 2.5Gb/s Width x8
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.3.11> port 0x6020-0x603f mem 0xd94a0000-0xd94bffff,0xd9440000-0xd947ffff,0xd94c4000-0xd94c7fff irq 16 at device 0.1 on pci8
ix1: Using MSIX interrupts with 9 vectors
ix1: RX Descriptors exceed system mbuf max, using default instead!
ix1: Ethernet address: 00:1b:21:53:e0:f0
ix1: PCI Express Bus: Speed 2.5Gb/s Width x8

Code:
FreeBSD 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011     root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-source/9.0/sys/GENERIC
 
t1066 said:
I had a different experience. A few days ago, I used some spare parts to build a backup system. I was able to transfer around 600 GB of data using zfs send and zfs recv over SSH in under 3 hours. Using sysutils/conky, I observed that the transfer rate went as far as 80 MBps, i.e., around 640 Mbps.

Using nc and setting kern.ipc.nmbclusters=256000 (default is 25600) on a gigabit NIC:
Code:
$ date;time nc -l -I 2097152 -n -4 192.168.100.145 6969 |zfs recv -v iscsi/targets;date
Wed May 22 11:09:00 PDT 2013
receiving full stream of iscsi/targets@keep127 into iscsi/targets@keep127
received 4.24TB stream in 46656 seconds (95.2MB/sec)

real    780m10.274s
user    12m0.797s
sys     396m23.783s
Thu May 23 00:09:10 PDT 2013

More zfs recv output with nc:
Code:
received 107GB stream in 1151 seconds (95.5MB/sec)

real    19m18.820s
user    0m17.944s
sys     9m46.697s


received 2.73GB stream in 27 seconds (103MB/sec)

real    0m41.448s
user    0m0.382s
sys     0m15.806s


received 112GB stream in 1251 seconds (91.7MB/sec)

real    21m8.634s
user    0m18.582s
sys     10m29.906s
 
yayix said:
To those who were able to get great throughput from 10 Gb cards, can you post the card model and specs?
Intel X540-T1:
Code:
ix0@pci0:3:0:0: class=0x020000 card=0x00028086 chip=0x15288086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet
Code:
(0:2) rz1:/sysprog/terry# uname -a
FreeBSD rz1.tmk.com 8.4-PRERELEASE FreeBSD 8.4-PRERELEASE #0 r250803M: Sun May 19 00:10:14 EDT 2013     terry@rz1.tmk.com:/usr/obj/usr/src/sys/RAIDZILLA2  amd64
(0:3) rz1:/sysprog/terry# dmesg | grep ix0
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.0 - 8> mem 0xf8e00000-0xf8ffffff,0xf8dfc000-0xf8dfffff irq 24 at device 0.0 on pci3
ix0: Using MSIX interrupts with 9 vectors
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: Ethernet address: a0:36:9f:xx:yy:zz
ix0: PCI Express Bus: Speed 5.0Gb/s Width x8
(0:4) rz1:/sysprog/terry# cat /boot/loader.conf
ixgbe_load="yes"
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbop=262144

Has anyone used the same card below and was able to make it work? If so, how did you do it? Appreciate some pointers here for kernel settings, drivers etc. We don't have a 10 Gb switch and are connecting it machine-to-machine.
It looks like you have a prior-generation card (2007 vs. 2012 for mine). I don't know if that is limiting your performance. You also seem to have an older version of the driver - 8-STABLE has 2.5.0 - 8 while you're running 2.3.11. For highest performance, try the loader.conf tunables I show above - they're from /sys/dev/ixgbe/README and also use jumbo frames.
 
Terry_Kennedy said:
Intel X540-T1:
Code:
ix0@pci0:3:0:0: class=0x020000 card=0x00028086 chip=0x15288086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet
Code:
(0:2) rz1:/sysprog/terry# uname -a
FreeBSD rz1.tmk.com 8.4-PRERELEASE FreeBSD 8.4-PRERELEASE #0 r250803M: Sun May 19 00:10:14 EDT 2013     terry@rz1.tmk.com:/usr/obj/usr/src/sys/RAIDZILLA2  amd64
(0:3) rz1:/sysprog/terry# dmesg | grep ix0
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.0 - 8> mem 0xf8e00000-0xf8ffffff,0xf8dfc000-0xf8dfffff irq 24 at device 0.0 on pci3
ix0: Using MSIX interrupts with 9 vectors
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: [ITHREAD]
ix0: Ethernet address: a0:36:9f:xx:yy:zz
ix0: PCI Express Bus: Speed 5.0Gb/s Width x8
(0:4) rz1:/sysprog/terry# cat /boot/loader.conf
ixgbe_load="yes"
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbop=262144


It looks like you have a prior-generation card (2007 vs. 2012 for mine). I don't know if that is limiting your performance. You also seem to have an older version of the driver - 8-STABLE has 2.5.0 - 8 while you're running 2.3.11. For highest performance, try the loader.conf tunables I show above - they're from /sys/dev/ixgbe/README and also use jumbo frames.

Thanks for the feedback. This is the one we have: Intel 10 Gigabit CX4 Dual Port Server Adapter EXPX9502CX4 (895897). See side-by-side comparison here.

Will test again hopefully next week. I guess we may have overlooked some needed sysctl & tunable settings. Also, we did try using the ixgbe driver.

We encountered the following:
  1. horrible ping times whenever setting MTU > 1500
  2. Code:
    kernel: ix0: Could not setup receive structures
  3. sending maxed out at 75 MB/s
 
Back
Top