10Gb link between 2 machines capped at 1Gb

Hi Networking. I just dropped a Cat7 cable between two machines:
- FreeBSD 12.3 with 10Gb Broadcom 57810 aka QLogic NetXtreme II PCIe card
- Linux with Dell whatever 10Gb NIC on board.

On Linux: sudo ifconfig eno1 inet 10.0.1.30 netmask 255.255.255.0 up
On BSD: sudo ifconfig bxe0 inet 10.0.1.20 netmask 255.255.255.0 up

dmesg on Linux confirms:
Code:
[435683.691959] bnx2x 0000:01:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[435683.703784] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[435943.951954] bnx2x 0000:01:00.0 eno1: NIC Link is Down
[435974.311960] bnx2x 0000:01:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit

dmesg on BSD confirms:
Code:
bxe0: link state changed to DOWN
bxe0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
bxe0: link state changed to UP
bxe0: link state changed to DOWN
bxe0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
bxe0: link state changed to UP

Lets speedtest:
Code:
pv -rp /dev/zero | ssh user@10.0.1.30 "cat > /dev/null"

172MiB/s on average ...

WTF? It appears to be capped at 1Gb/s. But why? How do I even begin to investigate this?
Could it be purely FreeBSD issue given this thread of mine?
 
you are being snarky, aren't you? Do, please, enlighten me. What perf drop should I expect with ssh encrypting a bunch of zeroes? 1Gb? 2Gb? ... 9Gb? Really? Like 9Gb/s drop on the button? So very precise
 
To be fair and frank. My intuition has failed me time and again when it comes to hardware and performance, so you may very well be correct. But a loss of 9/10 of perf just due to encryption ... source machine runs dual 8-core xeons albeit quite old with >100GB of RAM, target machine runs dual 18-core xeons and 256GB or RAM (again not new) so I would expect them to encrypt stuff a few times over before we hit compute bottleneck. But then I know little about how hard it is to encrypt ... maybe it is that slow
 
runs dual 8-core xeons albeit quite old with >100GB of RAM, target machine runs dual 18-core xeons and 256GB or RAM
The memory size doesn't improve the encryption speed. It's even bad if you are missing hardware based AES encryption (aesni(4)) and you are running on a single thread.
Not to mention that your SSH is probably using curve25519-sha256

1619358490465.png


 
Actually your result of 172MiB/s on average is not bad :)

Here it is on i7-2600k with SSH transfer of ~120MiB/s which is ~1006Mbits/sec ( 0.98Gbits/sec)

Code:
root@x1:/home/versus # iperf3 -c 192.168.100.108 -R
Connecting to host 192.168.100.108, port 5201
Reverse mode, remote host 192.168.100.108 is sending
[  5] local 192.168.100.105 port 17494 connected to 192.168.100.108 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   657 MBytes  5.51 Gbits/sec
[  5]   1.00-2.00   sec   973 MBytes  8.16 Gbits/sec
[  5]   2.00-3.00   sec  1.14 GBytes  9.83 Gbits/sec
[  5]   3.00-4.00   sec  1.13 GBytes  9.72 Gbits/sec
[  5]   4.00-5.00   sec  1.11 GBytes  9.51 Gbits/sec
[  5]   5.00-6.00   sec  1.13 GBytes  9.71 Gbits/sec
[  5]   6.00-7.00   sec  1.20 GBytes  10.3 Gbits/sec
[  5]   7.00-8.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   8.00-9.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   9.00-10.00  sec   950 MBytes  7.97 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.02  sec  10.6 GBytes  9.06 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  10.6 GBytes  9.07 Gbits/sec                  receiver

iperf Done.
root@x1:/home/versus # pv -rp /dev/zero | ssh versus@192.168.100.108 "cat > /dev/null"
Password for versus@coa:
117MiB/s
 
holly ... my intuition completely failed me again. My track record here is really poor.
So, first, apologies are in order. Thank you VladiBG for sharing and explaining. It would never've occurred to me.

I did run openssl speed out of curiosity but I think its just showing me how many hashes it managed to compute for algos mostly noone cares about much these days. Its manpage is somewhat quiet on exactly what `speed` shows.

I suppose the question now is how to send those 10TB of data over the wire. I've rushed things, passed 4 ethernet cables each configured with a different private network id and got 4Gb/s throughput this way. Now that I know encryption slows things down this badly, I'm thinking maybe even `nc` things over? Local network is completely under my control, no hostiles. Is there anything else out there that would let me if not saturate 10Gb, at least approach it?

-- edit --
There has to be rsync --smth --smth that would let me speed things up, surely? I mean ssh is just the default, right.
 
For private network rsync is good without ssh tunnel. It's not the fastest but it provide file integrity via checksum after the transfer so it's a good.
Other way is to use FTP, NFS, SAMBA, rcp or stay with SCP (SSH) by reducing it's encryption algorithm so it can saturate the network.

Did you test your network throughput with iperf3?
 
Code:
$ iperf3 -c 10.0.1.20 -R
Connecting to host 10.0.1.20, port 5201
Reverse mode, remote host 10.0.1.20 is sending
[  5] local 10.0.1.30 port 32960 connected to 10.0.1.20 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.09 GBytes  9.37 Gbits/sec                 
[  5]   1.00-2.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   2.00-3.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   3.00-4.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   4.00-5.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   5.00-6.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   6.00-7.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   7.00-8.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   8.00-9.00   sec  1.09 GBytes  9.40 Gbits/sec                 
[  5]   9.00-10.00  sec  1.09 GBytes  9.40 Gbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.40 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  10.9 GBytes  9.40 Gbits/sec                  receiver

I've not been able to figure how to make rsync go fast(er) or what it means to bypass ssh there. Best guess is to supply appropriate command as --rsh to it, but I hesitant about how exactly rsync uses that command. As is I'm getting maybe 80MB/s out of rsync. nc would do the trick IMO but suddenly I need to pass the filename ahead of time and find myself in strictly programming or probably scripting territory. I guess I could ssh "nc -l -p > $fileout" or smth. How is this not a solved problem? High perf computing crowd must have tools to saturate 10Gb/s connections to send I don't know some binary data that's not compressible (my case)
 
sender: tar cvf - /whatever|nc dest 9999
receiver: nc -l 9999|tar tvf -
replace tvf with xvf for the actual thing
you have no checksums but neither has ftp

probably using lz4 compression you can still saturate the network if you use 2-3 threads (like splitting the source in 3 parts and send them in parallel with 3 tar/nc scripts)
 
sender: tar cvf - /whatever|nc dest 9999
receiver: nc -l 9999|tar tvf -
replace tvf with xvf for the actual thing
you have no checksums but neither has ftp
For a single large file, pv on the sender should give something pretty to look at (and not distort throughput too much): pv < largefile | nc dest 9999
 
Code:
$ iperf3 -c 10.0.1.20 -R
Connecting to host 10.0.1.20, port 5201
Reverse mode, remote host 10.0.1.20 is sending
[  5] local 10.0.1.30 port 32960 connected to 10.0.1.20 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.09 GBytes  9.37 Gbits/sec               
[  5]   1.00-2.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   2.00-3.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   3.00-4.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   4.00-5.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   5.00-6.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   6.00-7.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   7.00-8.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   8.00-9.00   sec  1.09 GBytes  9.40 Gbits/sec               
[  5]   9.00-10.00  sec  1.09 GBytes  9.40 Gbits/sec               
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.40 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  10.9 GBytes  9.40 Gbits/sec                  receiver

I've not been able to figure how to make rsync go fast(er) or what it means to bypass ssh there. Best guess is to supply appropriate command as --rsh to it, but I hesitant about how exactly rsync uses that command. As is I'm getting maybe 80MB/s out of rsync. nc would do the trick IMO but suddenly I need to pass the filename ahead of time and find myself in strictly programming or probably scripting territory. I guess I could ssh "nc -l -p > $fileout" or smth. How is this not a solved problem? High perf computing crowd must have tools to saturate 10Gb/s connections to send I don't know some binary data that's not compressible (my case)
I’m not sure exactly about rsync (and is tricks with disabling compression or set chipher to ‘none’ in ssh really works for Your case?), but

take attention to NICs (and intermediate switches if You using them) I/O buffers utilizing, QoS and (most important!) RTO value.

May be experimenting and measurement of the RTO in FreeBSD’s TCP stack for certain network(s) give You additional bandwidth.

In some cases tuning RTO may give You extra +10-70(!)% of bandwidth and also eliminate so called “microbursts” (especially on 10G, 20G, 40G…).

But of course, tuning of TCP RTO not be used as replacement for great NICs and hi-cost enterprise switches with a big RAM in port round buffers and sophisticated firmware algorithms. Especially on speeds 10G +. Only as temporarily solution before You buy a hi-cost enterprise-grade switch.

P.S.
I reminded the cases like Your when because too short or too long TCP RTO bandwidth decreased from 10G to 1(!)G.

Not like writing especially for You, but for any other FreeBSD users here who may experience the same issue.
 
Back
Top