Lagg issue between FreeBSD box and Windows Seven (working well with Ubuntu)

Hi,

I've recently acquired an HP Proliant Microserver and I've installed FreeBSD 9 for its ZFS support.
The server comes with a Broadcom Corporation NetXtreme BCM5723 Gigabit Ethernet PCIe network card (detected as bge0), and I've added an Intel 1000CT card (82574L, detected as em0). I've loaded the box with a bunch of disks, and ZFS performs like a charm!

To maximize the throughput, I've bonded the two network interfaces via the lagg(4)module, and it works well:
  • I have failover when I disconnect one of my cables
  • I reach 900 Mbs to 1200 Mbs when doing iperf tests (these values will be explained later)

I also have an Ubuntu box with a Realtek gigabit card on board (RTL8111/8168B), and all the bandwidth tests with iperf I've done get me 900 Mbs (I plan to bond this box with another interface too, as soon as I receive another Intel 1000CT card, in order to perform over 1 Gbs, because it's my second NAS box and I'll backup my data)

However, I have a third PC, a Windows 7 (nobody's perfect ...). It has the same network card on board as the Ubuntu box, I believe, an RTL8111E chip.

The banwidth with his one is pretty much abnormal. I have 150 Mbs upload from the FreeBSD box to Windows, and 350 Mbs from Windows to FreeBSD.

Of course, I've done several tests as swapping cables to make sure it's not a physical issue. I've done all my tests with iperf, which is not hard disk dependant (and anyway, Windows 7 runs on an SSD). And done some tweaking on the Windows side, like enabling jumbo frames.

But, it did not help.

My original FreeBSD setup was:
Code:
cloned_interfaces="lagg0"
ifconfig_bge0="up"
ifconfig_em0="up"
ifconfig_lagg0="laggproto roundrobin laggport bge0 laggport em0"
ipv4_addrs_lagg0="192.168.0.3/24"

So, I've disabled roundrobin, and set the IP address on only one card as following (I've switched em0 and bge0 during my tests, to make sure the flaw wasn't coming from one of the cards) :

Code:
cloned_interfaces="lagg0"
ifconfig_bge0="up"
ifconfig_em0="up"
#ifconfig_lagg0="laggproto roundrobin laggport em0 laggport bge0"
#ipv4_addrs_lagg0="192.168.0.3/24"
ifconfig_bge0=" inet 192.168.0.3 netmask 255.255.255.0"
#ifconfig_em0=" inet 192.168.0.3 netmask 255.255.255.0"

When disabling roundrobin, I have a very generous bandwidth between the FreeBSD box and Windows (up to 900 Mbs with iperf, both sides, with bge0 and em0). Accordingly, my FTP transfers reach 100 MB/s for a real life example.

But when I re-enable roundrobin (the original setup), the bandwidth is once more crappy (150 Mbs upload and 350 Mbs download).

I've run iperf as a tcp server on the FreeBSD box, and with the Ubuntu and Windows boxes as simultaneous tcp clients, I reach 1.2 Gbs bandwidth (900+300) on the FreeBSD, so I know roundrobin is really working.
My goal is to have a "comfortable" bandwidth between the FreeBSD and the Ubuntu servers, because I'll have 6 TB backups going through the wires (I'll receive the missing card soon), so i need roundrobin on the FreeBSD side too.

But as the Windows box is my leisure PC, it's no way to keep the things as they are (30-40 MB/s will be too short for some uses, as reading big iso files). ZFS is soooo good, I don't want to switch the Proliant server from FreeBSD to another Linux with a raid =/ And besides, I'm a sysadmin, and I'm stubborn, I'll keep searching 'till I find a solution ...

For now, all the tests I've run lead me to believe that lagg(4)'s roundrobin somehow is "incompatible" with the Microsoft's IP stack. But I don't know what to do.
  • I have a proper physical network
  • I've searched for dropped packets but there's none reported by iperf (when you switch the utility to udp connections, it reports how much packets were lost, and there's none).
  • The network cards on Ubuntu and Windows are the same

If somebody has a solution (tweaking the FreeBSD box or Windows 7), let me know.
It really matters for me.

Thanks!
 
My ifconfig trace, if it helps!

Code:
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
        ether 68:05:ca:0a:0c:26
        inet6 fe80::6a05:caff:fe0a:c26%em0 prefixlen 64 scopeid 0x1
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=c019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
        ether 68:05:ca:0a:0c:26
        inet6 fe80::ea39:35ff:fe2d:f1cd%bge0 prefixlen 64 scopeid 0x2
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x9
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 68:05:ca:0a:0c:26
        inet 192.168.0.3 netmask 0xffffff00 broadcast 192.168.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto roundrobin
        laggport: bge0 flags=4<ACTIVE>
        laggport: em0 flags=4<ACTIVE>
 
I've run some additionnal tests this morning, when switching the FreeBSD box from "roundrobin" to "failover", I get once more a solid 900 Mbs iperf throughput (110 MB/s with real life ftp transfers). So it's only the roundrobin method which causes issues. Too bad it's the one I need =/

Some thoughts? (the roundrobin method is hmmm lightly documented in the lagg(4) manual)
 
Can you try setting up a ftp server on the Windows 7 box, and have the FreeBSD server fetch a file from it to test performance in that direction?

I believe this may help shed some light on where the problem is.
 
Do you have an Intel card to try in the Windows system?
Not yet, but i'll have one in one week, as soon as it's shipped (chinese PC hardware shops only sell Realtek, in France)
 
Hello Savagedlight, i've run the tests you asked me to, I musk admit i wasn't expecting these results.
FTP connections have been initiated BSD side, the read/writes are made on the ZFS pool.
I've installed the latest Filezilla Server on Windows, and configured the user's home directory on my SSD disk, to make sure disk I/O weren't the bottleneck. Firewall and antivirus disabled.

Sorry, for now, i can't run these kind of tests on the Ubuntu box; its HDD is really old, and will certainly be the bottleneck (i have a SSD in stock, but will have to reinstall the OS on this new SSD)

Failover :

Put :
Code:
229 Entering Extended Passive Mode (|||50546|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8324 MiB   48.90 MiB/s    --:-- ETA
226 Transfer OK
8728466265 bytes sent in 02:50 (48.90 MiB/s)

Get :
Code:
229 Entering Extended Passive Mode (|||50595|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8324 MiB   50.89 MiB/s    00:00 ETA
226 Transfer OK
8728466265 bytes received in 02:43 (50.89 MiB/s)

Roundrobin :

Get :
Code:
local: Assassins Creed II - Revelations.iso remote: Assassins Creed II - Revelations.iso
229 Entering Extended Passive Mode (|||52793|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   40.55 MiB/s    00:00 ETA
226 Transfer OK
8728033449 bytes received in 03:25 (40.55 MiB/s)


Put :

Code:
229 Entering Extended Passive Mode (|||52832|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   19.57 MiB/s    --:-- ETA
226 Transfer OK
8727791721 bytes sent in 07:05 (19.57 MiB/s)

One interface only mode :
Code:
Get :
229 Entering Extended Passive Mode (|||53590|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   51.46 MiB/s    00:00 ETA
226 Transfer OK
8728033453 bytes received in 02:41 (51.46 MiB/s)

Put :
Code:
229 Entering Extended Passive Mode (|||54535|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   54.04 MiB/s    --:-- ETA
226 Transfer OK
8727791725 bytes sent in 02:34 (54.04 MiB/s)

What could it mean ? I've got transfers reaching an excellent 100MB/s when Failover or only one interface is enabled on the BSD box, when Windows initiates connexions (FTP, NFS, pure tcp via iperf), but crappy transfer rates both ways when the BSD initiates connections on the Windows box, no matter how the BSD interfaces are configured (and it's even not true for iperf, which reaches 900Mbs when roundrobin is not set)
I'm quite lost ... More Windows optimizations ?
I really do want to run the same tests with the Ubuntu box for these results to make really sense, but it will have to wait a couple of days I think =/
 
One last word for tonight, i've installed the SSD on the Ubuntu box, i'm using Proftpd on this one. I've run some ftp transfers from the BSD (still on ZFS, roundrobin mode) connected to the Ubuntu's ftp (on SSD). As i was hoping, the transfer rates did improve (my previous HDD was REALLY too old)
It gives me nice rates :
Get :
Code:
ftp> get sr-acii.iso
local: sr-acii.iso remote: sr-acii.iso
229 Entering Extended Passive Mode (|||23155|)
150 Opening BINARY mode data connection for sr-acii.iso (6810935296 bytes)
100% |************************************************************************************************************************************************************************************************|  6495 MiB   90.24 MiB/s    00:00 ETA
226 Téléchargement terminé
6810935296 bytes received in 01:11 (90.24 MiB/s)

Put :
Code:
ftp> put sr-acii.iso
local: sr-acii.iso remote: sr-acii.iso
229 Entering Extended Passive Mode (|||51774|)
150 Ouverture d'une connexion de données en mode BINARY pour sr-acii.iso
100% |************************************************************************************************************************************************************************************************|  6495 MiB   97.33 MiB/s    00:00 ETA
226 Téléchargement terminé
6810935296 bytes sent in 01:08 (95.43 MiB/s)

So the issue is Windows centric (whether it comes from Windows or Freebsd)
 
Please forgive my last sentence, i'm tired ;-)
I'll run more tests tomorrow, between the Ubuntu and Windows boxes before drawing conclusions lol
 
I took a Ubuntu Live test session on my Windows box, it seems that the transfer rates are as bad as on Windows, so I guess there's something broken on my integrated NIC. Though I can't figure why the vary so much (from excellent to bad) with different LAGG settings on the FreeBSD =/
I'll receive a new Intel 1000CT NIC and a bunch of cat 6 cables tomorrow or friday.
I guess i will figure what's going on then ... or not lol

I'll keep you updated !
 
Hello,

to update my case, i haven't been able to solve the issue (i suspect it comes from the windows ip stack, and its ability to reorder ip packets with the roundrobin method ?!)
As I recently acquired a managed switch with etherchannel/LACP for the esx server i've had, i've bonded all my computers with LACP, including the FreeBSD, Linux and even Windows boxes, and it works fine.
Solid throughputs on all computers, including the Windows workstation (with the RTL8111E chip only, then with two 82574L cards i've managed to spare from another computer)

Case closed, thanks
 
PS : i haven't retried transfers on the RTL8111E in fact, i moved directly on the Intel Cards, on the Windows box
 
Back
Top