Solved TCP is running extremely slow. Why? (Qlogic, brgphy, bce, BCM5716, BCM5709)

PMc · May 28, 2021

With flood ping I can saturate the bandwidth, but TCP runs ten to hundred times slower. How can I search for the cause?

I think that this should saturate the link, but instead it looks disgusting - the link is configured to 10.5 Mbit:

Code:

Workstation$ ssh operator@RemoteHost dd if=/dev/da1s1b bs=64k count=100 > /dev/null
100+0 records in
100+0 records out
6553600 bytes transferred in 249.737903 secs (26242 bytes/sec)

The Topology:

Code:

Workstation ----- Gateway ----- Router(NAT) ---- /internet/---- RemoteHost
                                                             \--- OtherHost

One hop less makes a difference by factor 10. Why?:

Code:

Gateway$ ssh operator@RemoteHost dd if=/dev/da1s1b bs=64k count=100 > /dev/null
100+0 records in
100+0 records out
6553600 bytes transferred in 20.207757 secs (324311 bytes/sec)

Consequentially, one might assume something is going wrong in the LAN. But then with Otherhost everything looks as it should:

Code:

Workstation$ ssh operator@OtherHost dd if=/dev/da1s1b bs=64k count=100 > /dev/null
100+0 records in
100+0 records out
6553600 bytes transferred in 5.135515 secs (1276133 bytes/sec)
Gateway$ ssh operator@OtherHost dd if=/dev/da1s1b bs=64k count=100 > /dev/null
100+0 records in
100+0 records out
6553600 bytes transferred in 5.070852 secs (1292406 bytes/sec)

RemoteHost and OtherHost are at the same AS. All systems are FreeBSD 12.2. I tried three different machines for RemoteHost, all show the same behaviour. What is going on?

VladiBG · May 28, 2021

~~Limited by cryptographic of ssh~~. Try with another method for transfer test.

SirDice · May 28, 2021

PMc said:
One hop less makes a difference by factor 10. Why?

Cumulative effect of latency on that hop?

PMc · May 28, 2021

VladiBG said:
Limited by cryptographic of ssh. Use another method for transfer test.

Doesn't have such a big effect at these low bandwidths.

SirDice said:
Cumulative effect of latency on that hop?

Then why does it work with some remotes?

What actually does solve the issue is --- Linux. Runs even twice as fast as the link allows!

Code:

Workstation$ ssh me@RemoteHost dd if=/boot/vmlinuz-4.19.0-16-amd64 bs=64k > /dev/null
me@Remotehost's password: 
80+1 records in
80+1 records out
5287168 bytes (5.3 MB, 5.0 MiB) copied, 2.6279 s, 2.0 MB/s

Same remote booted with FreeBSD 12.2, and it's a pain:

Code:

Workstation$ ssh operator@RemoteHost dd if=/dev/da1s1b bs=64k count=100 > /dev/null
100+0 records in
100+0 records out
6553600 bytes transferred in 219.926650 secs (29799 bytes/sec)

sko · May 28, 2021

PMc said:
What actually does solve the issue is --- Linux. Runs even twice as fast as the link allows!

because it caches the writes from dd even if it shouldn't and lies about it.
this has always been the case for linux and more than once ruined the transfer of an image to USB for me, because dd stated it was done but in fact data was still trickling from the cache to the disk...

as for you 'bandwidth test': use a proper tool designed for that use case, e.g. iperf3

PMc · May 28, 2021

sko said:
because it caches the writes from dd even if it shouldn't and lies about it.

Probably yes. But then it does the job in 5 seconds, whereas FreeBSD takes 220 seconds, and that makes a difference!

sko said:
as for you 'bandwidth test': use a proper tool designed for that use case, e.g. iperf3

I am not doing "bandwidth test"! I have a serious problem and search for a way to solve it!!
This is where it all began, and I cannot live with this:

Code:

bareos-dir[7238]  Elapsed time:           10 hours 53 mins 39 secs
bareos-dir[7238]  Priority:               11
bareos-dir[7238]  FD Files Written:       63,315
bareos-dir[7238]  SD Files Written:       63,315
bareos-dir[7238]  FD Bytes Written:       729,510,490 (729.5 MB)
bareos-dir[7238]  SD Bytes Written:       757,150,268 (757.1 MB)
bareos-dir[7238]  Rate:                   18.6 KB/s

ssh then just exposes the problem, and therefore is as good as anything that exposes that there is a problem. And it's already there and doesn't need to be extra installed/implemented.

Vull · May 28, 2021

sko said:
because it caches the writes from dd even if it shouldn't and lies about it.
this has always been the case for linux and more than once ruined the transfer of an image to USB for me, because dd stated it was done but in fact data was still trickling from the cache to the disk...

as for you 'bandwidth test': use a proper tool designed for that use case, e.g. iperf3

I've noticed this too. It's worse on Ubuntu and Linux Mint than on Debian. What I do now is to umount the drive from a terminal window. The shell will hang, and not go back the command line prompt until it's safe to remove the drive.

PMc · May 28, 2021

Please open your own thread for copying things to USB devices etc. This is about a network issue, and I would like to stay focused.

PMc · May 28, 2021

Looking closer: this is a tcpdump on RemoteHost:

# tcpdump -nibce0 "port 10971"

Code:

14:03:08.892051 IP Myhost.10971 > RemoteHost: Flags [.], ack 162877, win 946, options [nop,nop,TS val 2704708687 ecr 430707449], length 0
14:03:08.892361 IP RemoteHost > Myhost.10971: Flags [.], seq 164305:165733, ack 72, win 1035, options [nop,nop,TS val 430707500 ecr 2704708687], length 1428
14:03:08.892376 IP RemoteHost > Myhost.10971: Flags [.], seq 165733:167161, ack 72, win 1035, options [nop,nop,TS val 430707500 ecr 2704708687], length 1428
14:03:08.928461 IP Myhost.10971 > RemoteHost: Flags [.], ack 164305, win 1035, options [nop,nop,TS val 2704708727 ecr 430707449], length 0
14:03:08.943521 IP Myhost.10971 > RemoteHost: Flags [.], ack 167161, win 1013, options [nop,nop,TS val 2704708737 ecr 430707500], length 0
14:03:08.943822 IP RemoteHost > Myhost.10971: Flags [.], seq 170017:171445, ack 72, win 1035, options [nop,nop,TS val 430707551 ecr 2704708737], length 1428
14:03:08.994250 IP Myhost.10971 > RemoteHost: Flags [.], ack 167161, win 1035, options [nop,nop,TS val 2704708787 ecr 430707500,nop,nop,sack 1 {170017:171445}], length 0
14:03:09.280060 IP RemoteHost > Myhost.10971: Flags [.], seq 167161:168589, ack 72, win 1035, options [nop,nop,TS val 430707888 ecr 2704708787], length 1428
14:03:09.330590 IP Myhost.10971 > RemoteHost: Flags [.], ack 168589, win 1013, options [nop,nop,TS val 2704709127 ecr 430707888,nop,nop,sack 1 {170017:171445}], length 0
14:03:09.331115 IP RemoteHost > Myhost.10971: Flags [.], seq 168589:170017, ack 72, win 1035, options [nop,nop,TS val 430707939 ecr 2704709127], length 1428
14:03:09.331148 IP RemoteHost > Myhost.10971: Flags [.], seq 171445:171457, ack 72, win 1035, options [nop,nop,TS val 430707939 ecr 2704709127], length 12
14:03:09.381048 IP Myhost.10971 > RemoteHost: Flags [.], ack 171445, win 991, options [nop,nop,TS val 2704709177 ecr 430707939], length 0
14:03:09.381326 IP RemoteHost > Myhost.10971: Flags [.], seq 171457:172885, ack 72, win 1035, options [nop,nop,TS val 430707989 ecr 2704709177], length 1428
14:03:09.418764 IP Myhost.10971 > RemoteHost: Flags [.], ack 171457, win 1035, options [nop,nop,TS val 2704709217 ecr 430707939], length 0

Probably somebody understands these figures better than me, but to me it seems there is no tcp-window at all in effect. Furthermore, the longest gap is between 14:03:08.994250 and 14:03:09.280060, and at that point the local machne (the dump happens on RemoteHost!) has received ack 167161, and it then waits 280 ms until it sends seq 167161:168589. Why, what does it wait for? (Machine is idle.)

VladiBG · May 28, 2021

You are missing packets which cause for retransmissions of seq and duplicate ack and sack. This may be caused by poor connection or policy based rate-limit with drop of excess traffic instead of shaping with queue.

PMc · May 28, 2021

VladiBG said:
You are missing packets which cause for retransmissions of seq and duplicate ack and sack. This may be caused by poor connection or policy based rate-limit with drop of excess traffic instead of shaping with queue.

Sorry, wrong answer.

I got curious and ran the sshd under truss. This shows that the process just stays in the write() system call, which seems not to return in a timely fashion. Also the bytecounts of these writes are somehow strange.

Then I found this here: https://forum.opnsense.org/index.php?topic=7068.0
I haven't fully recognized it yet, but I gave it a try and dropped

Code:

hw.bce.verbose="1"
hw.bce.msi_enable="0"
hw.bce.tso_enable="0"

into loader.conf.
And the bogus is gone, the piece runs! Before and after:

Code:

Workstation$ ssh operator@RemoteHost dd if=/dev/da0s1b bs=64k count=100 > /dev/null
100+0 records in
100+0 records out
6553600 bytes transferred in 201.163070 secs (32579 bytes/sec)
Fssh_packet_write_wait: Connection to *.*.*.* port 22: Broken pipe
Workstation$ ssh operator@RemoteHost dd if=/dev/da0s1b bs=64k count=100 > /dev/null
100+0 records in
100+0 records out
6553600 bytes transferred in 4.765690 secs (1375163 bytes/sec)

MSI is likely unrelated, and verbose doesn't tell anything. So it is TSO - the packets are not missed, they are somehow creepily rearranged by the TSO and that does not work well with the remainder of the path.

Solved TCP is running extremely slow. Why? (Qlogic, brgphy, bce, BCM5716, BCM5709)

Administrator