Solved FreeBSD 11 unreliable TCP connections

I have a laptop running FreeBSD 11.0 amd64, a server that was running FreeBSD 10.1 i386, and a gateway machine running FreeBSD 10.2 i386 . Everything worked fine. I recently "upgraded" the server machine to FreeBSD 11.0 amd64, and TCP started falling apart between the laptop and the server. The cross-architecture "upgrade" procedure involved installing FreeBSD 11.0 fresh, bringing back the old /usr/local, running "pkg bootstrap -f" and "pkg upgrade -f" for all the ports, and bringing back old filesystems (like /home) unchanged. Then recompile the programs I wrote (and sometimes discover unjustified int == long assumptions). This same procedure worked a couple of months back upgrading the laptop from FreeBSD 10.2 i386 to FreeBSD 11.0 amd64 without any problems.

Reproduction sequence: log in to the laptop. "slogin server". cd to a directory containing about 400 files (actually 50 files may be enough). Type the command "ls -l". The output whizzes by with some 5-second to 10-second pauses between bursts, then stops. (If I do this on the server console, there are no pauses and no lockup). This lockup is about 99.5% consistent. The only way out of that state I can figure is to disconnect and re-log-in to the server. If I leave the connection like that, it eventually (after a few minutes) says "Fssh_packet_write_wait: Connection to 192.168.1.3 port 22: broken pipe".

I can stay logged in and get work done if I remember to run any long output (which seems to be about 40 lines of text) though "more". If I make the xterm window 48 lines long, that is too much of a data burst and it will often lock up. "make" output causes no problems as long as the compiles are slow. If they are almost instant, it will lock up.

The problem is not limited to ssh connections. FTP gets stuck quickly. So do MySQL queries involving a lot of data (e.g. 4Kbytes). The worst is trying to access a web server on the server with firefox on the laptop, which usually gets a partially-complete page.

The same thing happens if I take the laptop off of wifi and put all 3 machines on the same ethernet wire (100Mbps). The laptop can use an xterm or a virtual console. Also, note I am using the exact same hardware that's been working for years, just the OS version is different. And it seems that FreeBSD 11.0 only has trouble talking to another FreeBSD 11.0. It doesn't matter whether I turn the (ipfw) firewall on or off on laptop, server, or both. The firewall allows pretty much everything on the LAN anyway.

The server can get stuff from the Internet via the gateway machine, such as downloading package updates, with no problem.

This smells like a MTU/MSS problem: anything that involves sending a full-size ethernet packet gets stuck, possibly because something along the line is refusing to deal with large packets. But I don't see where that happens on my LAN. (I have seen it happen on DSL modems where it encapsulates the packets, adding a small header, and neither end knows about it, and someone may be blocking ICMP, but the connection to the internet is the part that *is* working. I tried my little program to edit the MSS in TCP connection startup packets involving the server (even localhost!), but it didn't make any noticable difference.

Are there any TCP options that changed defaults between FreeBSD 10.1 and FreeBSD 11.0? like those net.inet.tcp.rfc* sysctl variables? Residual parts of HPN in sshd I need to configure OFF? Changes in the de0 driver?

Any ideas what I should check?
 
What interfaces are you using? I (though I know I'm not the only one) had so many problems with iwn(4) on 11.0 that I just went back to 10.3.
 
What interfaces are you using? I (though I know I'm not the only one) had so many problems with iwn(4) on 11.0 that I just went back to 10.3.
de0, on the server (this is probably pretty old hardware)
Code:
de0: <Digital 21140A Fast Ethernet> port 0xd180-0xd1ff mem 0xf7c69000-0xf7c6907f irq 19 at device 0.0 on pci5
de0: 21140A [10-100Mb/s] pass 2.2
bge0, (hardwired) on the laptop:
Code:
bge0: <Broadcom BCM5906 A2, ASIC rev. 0x00c002> mem 0xf4600000-0xf460ffff irq 17 at device 0.0 on pci3
bge0: CHIP ID 0x0000c002; ASIC REV 0x0c; CHIP REV 0xc0; PCI-E
miibus0: <MII bus> on bge0
brgphy0: <BCM5906 10/100baseTX media interface> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
also, a Wi-Fi USB dongle on the laptop:
Code:
run0: <1.0> on usbus7
run0: MAC/BBP RT5390 (rev 0x0502), RF RT5370 (MIMO 1T1R), address
Which interface on the laptop I use has no effect on the problem.
 
It seems that there's either something changed in the "de" driver, or the hardware is going bad. I switched over to the motherboard NIC re0, not supported at the time I got it but well supported now, and the lockups seem to have stopped.
 
Back
Top