Slow transfer over Gbps link across zfs servers

I'm having issues with data transfer over Gbps link across zfs servers.
One machine is a setup of mirror vdevs, unencrypted and can do about 130MB/s writes and about 300MB/s reads. This is the machine I was copying data from.

Destination machine is 4 disks in raidz0, for testing purposes, and can achieve writing speed of well over 250MB/s and reads over 350MB/s.

Transfer is done over Gbps link, and max transfer speed is about 55MB/s. At first I figured that it must be low quality cards, but iperf test shows over 880 Mbps. Therefore, it can't be cable/card issue.

Yet watching over gstat on both machines, they show only percentages of utilization, so it looks like it can't be slow disks issue. This is backed up by the fact that transfer remains same even when data is read from memory.

What is also interesting is that copying in both directions is about the same, with maybe 5-10 MB/s difference.

I tested with scp, rsync, and zfs send commands.

Why can't I get ~ 100MB/s transfer speed?

I should mention I also tried setting these values, but all tests (and iperf) remain same:

Code:
sysctl net.inet.tcp.sendspace=65536
sysctl net.inet.tcp.sendbuf_max=16777216
sysctl net.inet.tcp.recvbuf_max=16777216
 
Well, I'm stumped. I don't know why it's hovering around 50Mb/s only.
So I tried making memory disks, to try and copy from memory instead of disks to again eliminate possibility of connection issue
# mdconfig -at -swap -s1g
And then newfs and mount, copy.
As soon as I try copying from memory disk server freezes. I get a bunch of kernel trap messages quickly changing. Can you not do this or what's going on.
 
Are all of the transfers using ssh - i.e. with encryption?

I use a HP Microserver for my backup server and its anaemic CPU is the bottleneck for any transfers if encryption is involved. The zpool itself can do >100 MB/s but I only get around 35 MB/s encrypted.

Check your CPU utilization and if it's high you should try (assuming you trust your network) unencrypted transfers or at least use a faster cipher like blowfish (--cipher blowfish)

e.g. use net/nc
Code:
$ssh you@192.168.0.10 'nc -l 1234 | zfs recv backup/foo/bar'
$zfs send foo/bar@baz | nc 192.168.0.10 1234
 
Yes, I tried with blowfish, these are fast CPUs (Intel i7), so this shouldn't matter, nor is the CPU hogged by this. It is strange. I don't know what the bottleneck is.

Thanks.
 
Thanks.
I tried iperf (see first post). It shows 880Mbps.

But, I think I eliminated that its disks issue (quite a relief, actually), but now I don't know why is network transfer slow, when iperf shows much greater speed than 50 MB/s. Maybe my test is flawed, here's what I did:

[cmd=]mdconfig -at swap 1g[/cmd]
[cmd=]zpool create -o altroot=/mnt test /dev/md0[/cmd]

This was done on both sides, so no disks involved. Then, on the receiving side:
[cmd=]nc -l 8023 | zfs receive -vu test[/cmd]
And send with (after populating with 1g and snapshoting)
[cmd=]zfs send test@test | nc 172.16.0.2 8023[/cmd]
It shows constant 50MB/s. I tried with two different cards/drivers, vge0 and re0.

So why is iperf showing 110MB/s, but I can't transfer over 50MB/s?
 
Oops, sorry for my carelessness. How about testing the transfer speed of ftp or nfs? This should eliminate the possibility that there are some overheads in the other tested programs.
 
You don't mention which versions of FreeBSD are on the servers, nor which versions of ZFS, nor the complete pool layouts, nor whether compression/dedupe is enabled, nor the command-line used to do the "zfs send", nor how full each pool is, nor the types of drives/controllers involved, nor the types of NICs involved, etc, etc, etc.

It definitely is possible to get line-rate out of a "zfs send" between ZFS servers. I average 750 Mbps between two ZFS servers, with peaks to 950 Mbps (10 second averages) as shown by iftop(8).

The sending server has 4 raidz2 vdevs of 6x 2.0 TB disks, with lzjb compression and dedupe enabled. Using 3.0 Gbps SATA2 drives connected to mps(4) controllers.

The receiving server has 4 raidz2 vdevs of 6x 2.0 TB disks, with lzjb compression and dedupe enabled. Using 6.0 Gbps SATA3 drives connected to the same model mps(4) controllers.

However, I've enabled the HPN patches in OpenSSH (default in FreeBSD 9.0 and 8.3) along with the NONE cipher (no encryption of the data channel). Without the HPN patches, transfers are about 30% slower. Without the NONE cipher, transfers are even slower still.

The two servers above are running 9-STABLE r234622, with less than 40% usage in the pools.

I have another server running 9.0-RELEASE, with a pool that's 70% full, also using lzjb compression and dedupe. "zfs send" from this server rarely breaks 100 Mbps, usually hovering around 50-80 Mbps, with the very occasional peak of 200 Mbps. Mostly due to the fragmented nature of the pool (snapshots going back 2 years).

So it is possible to get fast transfers out of "zfs send". But it's very dependent on the hardware, software, pool layout, pool fullness, etc.
 
Well I'm happy that I found the culprit, I'm not happy because that means I need to get new GigEthernet cards.

Let me start from beginning.

phoenix said:
You don't mention which versions of FreeBSD are on the servers, nor which versions of ZFS, nor the complete pool layouts, nor whether compression/dedupe is enabled, nor the command-line used to do the "zfs send", nor how full each pool is, nor the types of drives/controllers involved, nor the types of NICs involved, etc, etc, etc.

Thanks Phoenix for trying to help, it means a lot when someone of your expertise jumps in a conversation.
I didn't mention any of those because they are all set to "standard/default", FreeBSD version is 9-STABLE (maybe week old), zfs pool layout is irrelevant, there is no compression, no dedupe, pools are not as nearly full with crap as I wish they were, NICs are Realtek, two different cards, using vge and re drivers, which I mentioned.

I mentioned that one pool is mirror vdevs and another is raid0 and both achieve speeds way faster than I can get over network.

I also said I did transfer from md disks which takes out any of that out of the equation. So having tried two different cards, I tried to copy to my old laptop with Intel em driver, and Voila! ~ 100MB/s.

So now I have two questions. First how is it possible that iperf (which I took as a basis for conclusion that my Realtek cards are fine) reports transfer speed of 880 Mbps, when I clearly couldn't transfer more than 400Mbps until I tried with Intel card.

And second, copying from re to re cards only gave 50Mb/s, while copying from re to em, gave full line speed. That tells me that receiving buffer on re side is not ok. Now whether its a driver issue or card, that I don't know. Nor I care, since I spent 48 hours working on this and I'm happy it has nothing to do with my disks.

Thanks all.
 
Back
Top