Solved tcp packets lost after upgrade OS

Hi,
I have upgrade my server from freeBSD 10.3 to FreeBSD 11.1-RELEASE
after upgraded, I am facing with TCP packets lost issue.
By mtr with ICMP it fine:
Code:
[root@m1 conf.d]# mtr  -r -c 10 10.67.142.67
Start: Tue Mar 27 21:01:17 2018
HOST: m1.x.com       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.67.141.167              0.0%    10    0.2   0.2   0.2   0.2   0.0
  2.|-- 10.67.142.67               0.0%    10    0.3   0.3   0.2   0.3   0.0

But when using mtr with TCP, there are much packets lost:
Code:
[root@m1 conf.d]# mtr --tcp --port 443 -r -c 10 10.67.142.67
Start: Tue Mar 27 20:56:31 2018
HOST: m1.x.com       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.67.141.167              0.0%    10    0.2   0.2   0.2   0.4   0.0
  2.|-- 10.67.142.67              20.0%    10  1501. 1202. 500.2 1502. 366.8

Code:
[root@m1 conf.d]# mtr --tcp --port 6556 -r -c 10 10.67.142.67
Start: Tue Mar 27 21:15:46 2018
HOST: m1.x.com       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.67.141.167              0.0%    10    0.3   0.3   0.2   0.4   0.0
  2.|-- 10.67.142.67              20.0%    10  1501. 1327. 1001. 1502. 243.1

mtr to another hosts (FreeBSD 10.3) in same subnet are fine:
Code:
[root@m1 conf.d]# mtr --tcp --port 6556 -r -c 10 10.67.142.113
Start: Tue Mar 27 21:17:28 2018
HOST: m1.x.com       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.67.141.167              0.0%    10    0.2   0.2   0.2   0.3   0.0
  2.|-- 10.67.142.113              0.0%    10  1001. 521.0 200.8 1001. 193.3

Code:
[root@m1 conf.d]# mtr --tcp --port 443 -r -c 10 10.67.142.113
Start: Tue Mar 27 21:17:48 2018
HOST: m1.x.com       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.67.141.167              0.0%    10    0.2   0.2   0.2   0.4   0.0
  2.|-- 10.67.142.113              0.0%    10  158.2 436.7 158.2 501.3 136.0

Could you please suggest how can I troubleshoot this issue?
Thank you very much!
 
Did you make your upgrade in-place or remotely? Meaning, have you physically moved/opened the machine or anything? We used to have really weird problem lately, with TCP/IP performance dropping dramatically (strangely ICMP were not affected), in the end it turned out to be a cabling issue. Can you check if cabling is okay, maybe test with another patch cable?
 
media: Ethernet autoselect (1000baseT <full-duplex>)
That could be an issue in itself. It doesn't happen often but... If you have a switch which uses autoselection to determine the speed and you also use auto negotiation on your NIC then it can happen that both continue to negotiate to establish the optimal speed. As said; this doesn't happen often but even so it might be beneficial not to rely on auto negotiation but instead set your hardware to what it's capable of using.
 
That could be an issue in itself. It doesn't happen often but... If you have a switch which uses autoselection to determine the speed and you also use auto negotiation on your NIC then it can happen that both continue to negotiate to establish the optimal speed. As said; this doesn't happen often but even so it might be beneficial not to rely on auto negotiation but instead set your hardware to what it's capable of using.
With Gigabit networking auto-negotiation is mandatory. Fixing the speed/duplex settings will actually disconnect you from the network entirely.
 
With Gigabit networking auto-negotiation is mandatory. Fixing the speed/duplex settings will actually disconnect you from the network entirely.
I don't know where you got that theory from but it's incorrect. Took me a while to respond because I don't have a gigabit network at home, but when I tried this on a test box elsewhere:

Code:
root@fbsd:~ # ifconfig em0
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 08:00:27:56:a0:1c
        hwaddr 08:00:27:56:a0:1c
        inet 10.0.1.7 netmask 0xffffff00 broadcast 10.0.1.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 1000baseT <full-duplex>
        status: active
root@fbsd:~ # ping -c1 10.0.1.100
PING 10.0.1.100 (10.0.1.100): 56 data bytes
64 bytes from 10.0.1.100: icmp_seq=0 ttl=255 time=1.376 ms
(edit)

Just for context: this is a test box hooked onto a regular gigabit switch, not a smart switch or anything. So I can't change anything there with regards to settings and such. But as you can see, if I fix my nic then my connection remains.
 
I don't know where you got that theory from but it's incorrect.
No, it's not.

Autonegotiation was originally defined as an optional component in the Fast Ethernet standard.[2] It is backwards compatible with the normal link pulses (NLP) used by 10BASE-T.[3] The protocol was significantly extended in the gigabit Ethernet standard, and is mandatory for 1000BASE-T gigabit Ethernet over twisted pair.
https://en.wikipedia.org/wiki/Autonegotiation

Just for context: this is a test box hooked onto a regular gigabit switch, not a smart switch or anything. So I can't change anything there with regards to settings and such. But as you can see, if I fix my nic then my connection remains.
I have a 24 port, gigabit HP Procurve at home. And it definitely disconnects a host if you try to fix the speed/duplex settings to 1000Mbit/full. Like you I was accustomed to fixing the speed/duplex settings due to various incompatibilities and negotiation failures. But this was with 10/100Mbit switches (Fast Ethernet) and network cards.
 
Some more information: http://noahdavids.org/self_published/gigabit-AN.html

TL;DR: While you can fixate the speed/duplex settings auto-negotiation is still going to happen. Auto-negotiation is used for more than just the speed/duplex settings and is an integral part of the Gigabit specifications.

Apparently my HP Procurve stops negotiations when you fixate the speed/duplex settings. And that's why it disconnects.
 
SirDice is right.

Unless a certain manufacture specifies it (and then I would raise my eyebrow at that vendor) all GE ports should be set to negotiate. This is in the protcol specs and is also best practice. When I say GE I mean 1GE, 10GE, etc.

Now, if we are using the old Ethernet/FastEthernet stuff then this is best practice:
  • for infrastructure connections (connections between routers, switches, servers, firewalls) hardcode speed and duplex. Do not use autonegotiate. How many times have I seen an auto/auto link work great for months and then for no reason loose its marbles one day.
  • for desktop connections try using auto/auto first. Way to many flavors of machines, OSs, NICs, etc moving around inside the building for hardcoding to be a sane practice.

Back to GE I have yet to see a single link lose its marbles using autonegotiation, except perhaps once and that was simply due to a crappy vendor/product.

Now there has been a change in how this has been configured over the years and this varies with vendor/product too. In the old days we typically had these choices: speed auto, or speed hardcoded 10/100. Now we see this on machines: speed autonegotation on, speed 10 autonegotation on, speed 100 autonegotation on, speed 1000 autonegotation on, speed 10 autonegotation off, speed 100 autonegotation off, speed 1000 autonegotation off. Please note my experience has been 99% with big iron Cisco and Juniper routers, and very little desktop, servers, etc.

So basically what SirDice said. :)
 
Back
Top