Maximum file transfer speed

I was just wondering what do you guys achieve as a maximum file transfer speed through gigabit ethernet?
I've seen numbers close to 40MB/s, however mostly it's around 30MB/s.
I tried to double (or at least increase) this by using 'lagg' for two similar servers who both have two gigabit NICs. The speed however halved when using roundrobin protocol and almost remained the same when using loadbalance protocol. I had the best results when uploading from one server to my single PCI gigabit NIC desktop: close to 40MB/s.
I use a unmanaged gigabit ethernet switch, so lacp protocol could not be tested.
Any thoughts?
 
On my LAN (over an unmanaged gigE switch) I get about 35-45MByte/sec over FTP (between gigE NICs). I have not investigated whether disk I/O is the bottleneck or something else. I have not tweaked network settings (sysctls, jumbo frames, etc.) because the occasional file transfer is all I do internally wrt bulk data.
 
I have tested this by sftp of a single (6GB) *.tar file.
The servers have Ultra3 SCSI harddisks (in fact they should have Ultra 320 disks, but that's how they were for sale in second hand).
My desktop uses SATA (second generation).
 
Are your numbers based on what sftp reports or on a network gauge of some sort? In other words: is the 30-40 Mbyte/sec the payload or the actual traffic (encryption overhead included)? Or is ssh's cpu load a possible bottleneck?
 
Those numbers were what 'sftp' reported.
I also checked traffic when copying this big file from one disk (server) to the desktop disk. One of the server disks is used as nfs. While copying went on, I checked with:

Code:
$ systat -ifstat

and saw very similar results (in the range of 30-40MB/s)
 
Well, all I can say is that your numbers are not far from mine (based on local ftp on gigE), and that the difference may be attributed to the encryption overhead of ssh. You could try running an iperf test, with an iperf server on one side and an iperf client on the other. It's very easy to set up.

benchmarks/iperf
 
Try this rather.

Create a listener on one machine:
Code:
nc -l 0 2000 >/dev/null

On the other machine:
Code:
dd if=/dev/zero bs=1440 count=100000 |nc <ip> 2000

Replacing <ip> with the listening machine's IP address.

When dd finishes it should report the average throughput.
 
30-40 MB/s is looking like HDD bottleneck. Try to look `systat -vms 1` when downloading. I see numbers like 32MB/s reding from disk on my system (sata disk).
Also, you can try to create a ramdisk using mdconfig and put a test file there (on both sides).
Also, i have read an article(dont have link) about PCI bus cant actually give 1GB/s network bandwith (but i not tested anyting about this).
 
aragon said:
Try this rather.

Create a listener on one machine:
Code:
nc -l 0 2000 >/dev/null

On the other machine:
Code:
dd if=/dev/zero bs=1440 count=100000 |nc <ip> 2000

Replacing <ip> with the listening machine's IP address.

When dd finishes it should report the average throughput.


Here are the results:

Code:
%dd if=/dev/zero bs=1440 count=100000 | nc 192.168.0.100 2000
100000+0 records in
100000+0 records out
144000000 bytes transferred in 1.827984 secs (78775302 bytes/sec)

So that's around 75MB/s

Some hardware info:

Test was done on two identical HP DL360 G4 with one 3.4GHz 1MB L2 cache CPU. All gigabit NICs are connected to an unmanaged gigabit ethernet switch. The NICs are configured in 'lagg' with loadbalance protocol.
 
DutchDaemon said:
Well, all I can say is that your numbers are not far from mine (based on local ftp on gigE), and that the difference may be attributed to the encryption overhead of ssh. You could try running an iperf test, with an iperf server on one side and an iperf client on the other. It's very easy to set up.

benchmarks/iperf

Here are the results for iperf:

Code:
%iperf -c 192.168.0.100
------------------------------------------------------------
Client connecting to 192.168.0.100, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.101 port 57663 connected with 192.168.0.100 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.1 sec    842 MBytes    698 Mbits/sec

So that's 87.25 MB/s
 
My latest experiences with lagg are:

- roundrobin: poor performance, don't know why, maybe a buggy implementation
- loadbalance: if you have only one connection (one client and one server on one socket), you are running on one physical interface only -> no gain, but works as designed, see man lagg(4)

But as you see, it's not your network. As you are using a HP server, do you have your disks attached to a HP SmartArray controller? If so, don't hesitate to google on that. You will find a lot discussions related to poor performance (mostly about Linux, HP RAID controllers are sh**).

cheers,
honk
 
I also noticed during these tests that 'lagg' with loadbalance doesn't seem to increase the bandwith:

Code:
%systat -ifstat
                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average

      Interface           Traffic               Peak                Total
          lagg0  in      0.077 KB/s         86.925 MB/s            5.627 GB
                 out     0.187 KB/s          2.409 MB/s            6.129 GB

           bge1  in      0.077 KB/s          2.750 KB/s          288.343 MB
                 out     0.187 KB/s          1.906 KB/s           11.300 MB

           bge0  in      0.000 KB/s         86.922 MB/s            5.346 GB
                 out     0.000 KB/s          2.409 MB/s            6.118 GB

So these results seem to be the speeds of one (Broadcom) interface.
My PCI NIC on my desktop had speeds up to 65MB/s. So that's
somewhat slower than the onboard interfaces.
 
honk said:
My latest experiences with lagg are:

- roundrobin: poor performance, don't know why, maybe a buggy implementation
- loadbalance: if you have only one connection, you are running on one physical interface only -> works as designed, see man lagg(4)

But as you see, it's not your network. As you are using a HP server, do you have your disks attached to a HP SmartArray controller? If so, don't hesitate to google on that. You will find a lot discussions related to poor performance (mostly about Linux, HP RAID controllers are sh**).

cheers,
honk

Well it is my network... :)
At the beginning of this year I bought two second hand HP DL360 G4 for 200 euro a piece. Currently the disks are configured in RAID0. What I don't like is that you cannot simply disable this RAID stuff. Otherwise these are excellent machines and very low on power when under no load condition. With the 'powerd' the CPU clocks down to 350MHz. This means from 103W to about 10W CPU power dissipation. I bought these machines after occasionally annoying swapping on my desktop while doing some GNU Octave stuff.

There is something I don't get about the 'loadbalance'. It seems to be using only one interface, while in fact all interfaces (2X2) are all connected. Logically bandwidth should be doubled or am I missing something?
 
Well it is my network

Why? You got 85MBytes/sec. This is not that bad. What do you expect? I guess iperf calculates only the payload inside of the tcp-connection. Further I guess you have a MTU of 1500 bytes. This means for the payload there are only 1460 bytes available per packet. IP consumes at least 20 bytes for header information's, and tcp additional 20 bytes. The resulting packet on the wire is at least 1514 bytes, because of the additional ethernet header (14 bytes).

If you use lagg in loadbalance mode and you test with one client, that means you are talking to one mac-address and one ip-address. The lagg manpage states, that it calculates a hash over mac- and ip-addresses as as decision mechanism (which packet will be send over which physical interface). Therefore the communication will only run on >>one<< physical link (that's what you can see in your systat output).

What you want is lagg's roundrobin mode where packets are distributed over all physical links one-by-one. But as I already said, my experience are that this leads to a poor performance, possibly because of a buggy scheduler implementation?!

You should ask yourself another question. If you get 85MBytes/sec over the network, why is the transfer rate only 30-40MBytes/sec with nfs. And I guess it is caused by your disks/raid-controller.

You also wrote, that you have an unmanaged switch. How can you set up a etherchannel with lagg properly if you can't configure it on the switch side???

cheers,
honk
 
honk said:
Why? You got 85MBytes/sec. This is not that bad. What do you expect? I guess iperf calculates only the payload inside of the tcp-connection. Further I guess you have a MTU of 1500 bytes. This means for the payload there are only 1460 bytes available per packet. IP consumes at least 20 bytes for header information's, and tcp additional 20 bytes. The resulting packet on the wire is at least 1514 bytes, because of the additional ethernet header (14 bytes).

If you use lagg in loadbalance mode and you test with one client, that means you are talking to one mac-address and one ip-address. The lagg manpage states, that it calculates a hash over mac- and ip-addresses as as decision mechanism (which packet will be send over which physical interface). Therefore the communication will only run on >>one<< physical link (that's what you can see in your systat output).

What you want is lagg's roundrobin mode where packets are distributed over all physical links one-by-one. But as I already said, my experience are that this leads to a poor performance, possibly because of a buggy scheduler implementation?!

You should ask yourself another question. If you get 85MBytes/sec over the network, why is the transfer rate only 30-40MBytes/sec with nfs. And I guess it is caused by your disks/raid-controller.

You also wrote, that you have an unmanaged switch. How can you set up a etherchannel with lagg properly if you can't configure it on the switch side???

cheers,
honk

At first I was expecting a increase in file transfer, since there are two links. Like you explained this won't work. I tested with 'roundrobin' and only achieved half the speed. From one server towards a single port there was no difference between 'loadbalance' and 'roundrobin'.

So yes, the bottleneck are the HDs. In fact they are just Ultra3
SCSI and not Ultra320.

I was in fact interested in the network speed since I often use X-forwarding. See also the thread I started about ssh, telnet, etc... So basically I want to achieve the highest speed the hardware is capable of.

Anyway, thanks all for making things more clear.
 
Back
Top