I've got two servers that are twins. They are SuperMicro mother board SYS-1019-WTR, Xeon Gold 6246R 16 core processor, 6 memory lanes, 96GB, triple 4GB MLC2 Sata. Purchased 2023. Ethernet configured as: ifconfig_ixl0="inet 192.168.xxx.xxx netmask 255.255.0.0 mtu 1468"
I also have a much older SuperMicro, the Ethernet on that is a 'igb0' flavor; ifconfig_igb0="inet 192.168.xxx.xxx netmask 255.255.0.0 mtu 1468"
The new servers arefreebsd-version 14.1-RELEASE-p5, I was able to upgrade the old server to 14.3-RELEASE-p7
When the two new servers were on the same LAN getting their software developed, and the MTU was 1500, they could ssh/scp files between them and the old 'igb0' server.
One server is now out in the real world on a fiber connect. This is the "Hosted Server". The other is still on my local LAN. This is the digital twin., The old 'igb0' sever is also on the LAN.
The smaller MTU is because that is what it took to make the LAN able to use StarLink with it's CGNAT set-up, and that is what it took to make ssh connections out to the Hosted Server and keep the SSH connection up over longer periods of time (1-5 days). In the AI chat, the possiblity that the 710 Ethernet hardware might be unhappy with the 1468 MTU. I suspect that the old server can connect to the Host Server out in the real world because it has to go through a router, to StarLink, then another router (albeit also CGNAT for the fixed IP address), and then a final router for port forwarding that has the Hosted Sever on it. The router could be compensating for the 1468 MTU from the old server, which pointed to a problem with the new Digital Twin server somehow.
The "Old Server" can ssh/scp out to the hosted server no problem. But the Digital Twin can not ssh/scp to either the old server or the hosted server. It just hangs. I had a four hour Google AI session working on the problem. While AI has never solve the problem, it often gets close and makes me aware of what to look at. I had the AI system create a summary of our discussion, which I enclose:
Subject: FreeBSD 14.1: ixl(4) Client Sees SYN-ACK in tcpdump but Handshake Times Out (SYN_SENT)
Environment:
Client: FreeBSD 14.1-RELEASE, Intel X710-series NIC using ixl(4) driver.
Target Server: FreeBSD 14.1-RELEASE, Intel i210-series NIC using igb(4) driver.
Network: Same Layer 2 LAN/Switch.
Tuning: Both hosts use MTU 1468 (MSS 1428) due to upstream Starlink requirements.
Comparison: Windows 11 on the same LAN connects to the target port without any issues.
The Problem:
Outbound SSH or nc from the ixl client to the igb server hangs indefinitely. Verbose SSH logs show a hang immediately after Connecting to.... On the client, the socket remains in SYN_SENT.
Observed in tcpdump:
Client sends SYN.
Target Server receives SYN and responds with SYN-ACK [S.].
Client-side tcpdump physically shows the SYN-ACK arriving at the ixl0 interface.
The OS ignores the packet. The client never sends the final ACK, and instead re-transmits the SYN.
Troubleshooting Steps Taken (No success):
Firewalls: Disabled PF on both ends. IPFW is not loaded.
TCP Extensions: Set net.inet.tcp.rfc1323=0, sack.enable=0, and blackhole=0.
Offloading: Disabled rxcsum, txcsum, tso, and lro on both ixl and igb interfaces.
Kernel Checks: Cleared hostcache and verified rfc1122_strong_es=0.
ARP Issue: Initially, the server would not automatically ARP for the client; a manual arp -s on the server was required to get the SYN-ACK onto the wire. Even with static ARP and successful ping (0.013ms RTT), TCP handshakes still fail.
QoS: Tested with ssh -o IPQoS=none to rule out IP_TOS 0x48 (AF21) drops by network hardware.
It appears the ixl driver or the 14.1 kernel is rejecting these inbound SYN-ACKs despite them being visible in promiscuous mode. Is this a known regression in the iflib based ixl driver or a specific conflict with non-standard MTUs on the X710?
Hardware & Contextual Note:
Identical Hardware: The local "problem" client and the target server are identical motherboards purchased at the same time.
Off-site Success: An identical production server (same motherboard/NIC/FreeBSD 14.1) is located off-site. The old local igb server can connect to the off-site server without issue over the Starlink and other routers in the path.
The Discrepancy: The ixl(4) client successfully handles TCP handshakes when the Destination MAC is the Starlink gateway (routed traffic), but it "ghosts" the SYN-ACK when the Destination MAC is a local peer on the same switch (L2 traffic).
ARP Behavior: The target server fails to automatically populate its ARP table for the client. Even after a manual arp -s is added and ping succeeds, the ixl driver appears to drop the inbound SYN-ACK before it reaches the TCP stack.
The last thing suggest to try (I've not done this yet) was:
In the bios:
This is primarily a firmware-level setting. Check your BIOS/UEFI settings under the NIC configuration for a "Hardware LLDP" or "Firmware LLDP" toggle.
Disabling LLDP in Supermicro BIOS
To disable the hardware agent, you must access the UEFI Device Settings during the boot process:
Enter BIOS: Press the <Del> or <F2> key during system boot.
Navigate to Advanced: Go to the Advanced tab in the BIOS menu.
Device Settings: Look for PCIe/PCI/PnP Configuration or a direct Intel(R) Ethernet Connection X710 entry under the Advanced tab.
NIC Configuration: Select the specific ixl port (e.g., NIC Configuration).
LLDP Agent: Find the setting labeled LLDP Agent and set it to Disabled.
I'm in over my head on this one. I'm hoping the above will lead to someone recognizing the problem and they can advise what to try next. The posts I found were about 2 years old, hopefully this is known issue. Once I can get the machine back on line (it can't even do a package update), I plan to upgrade to 14.3. Once that works, I'll update the Hosted Server to 14.3 too.
TIA
I also have a much older SuperMicro, the Ethernet on that is a 'igb0' flavor; ifconfig_igb0="inet 192.168.xxx.xxx netmask 255.255.0.0 mtu 1468"
The new servers arefreebsd-version 14.1-RELEASE-p5, I was able to upgrade the old server to 14.3-RELEASE-p7
When the two new servers were on the same LAN getting their software developed, and the MTU was 1500, they could ssh/scp files between them and the old 'igb0' server.
One server is now out in the real world on a fiber connect. This is the "Hosted Server". The other is still on my local LAN. This is the digital twin., The old 'igb0' sever is also on the LAN.
The smaller MTU is because that is what it took to make the LAN able to use StarLink with it's CGNAT set-up, and that is what it took to make ssh connections out to the Hosted Server and keep the SSH connection up over longer periods of time (1-5 days). In the AI chat, the possiblity that the 710 Ethernet hardware might be unhappy with the 1468 MTU. I suspect that the old server can connect to the Host Server out in the real world because it has to go through a router, to StarLink, then another router (albeit also CGNAT for the fixed IP address), and then a final router for port forwarding that has the Hosted Sever on it. The router could be compensating for the 1468 MTU from the old server, which pointed to a problem with the new Digital Twin server somehow.
The "Old Server" can ssh/scp out to the hosted server no problem. But the Digital Twin can not ssh/scp to either the old server or the hosted server. It just hangs. I had a four hour Google AI session working on the problem. While AI has never solve the problem, it often gets close and makes me aware of what to look at. I had the AI system create a summary of our discussion, which I enclose:
Subject: FreeBSD 14.1: ixl(4) Client Sees SYN-ACK in tcpdump but Handshake Times Out (SYN_SENT)
Environment:
Client: FreeBSD 14.1-RELEASE, Intel X710-series NIC using ixl(4) driver.
Target Server: FreeBSD 14.1-RELEASE, Intel i210-series NIC using igb(4) driver.
Network: Same Layer 2 LAN/Switch.
Tuning: Both hosts use MTU 1468 (MSS 1428) due to upstream Starlink requirements.
Comparison: Windows 11 on the same LAN connects to the target port without any issues.
The Problem:
Outbound SSH or nc from the ixl client to the igb server hangs indefinitely. Verbose SSH logs show a hang immediately after Connecting to.... On the client, the socket remains in SYN_SENT.
Observed in tcpdump:
Client sends SYN
Target Server receives SYN and responds with SYN-ACK [S.].
Client-side tcpdump physically shows the SYN-ACK arriving at the ixl0 interface.
The OS ignores the packet. The client never sends the final ACK, and instead re-transmits the SYN.
Troubleshooting Steps Taken (No success):
Firewalls: Disabled PF on both ends. IPFW is not loaded.
TCP Extensions: Set net.inet.tcp.rfc1323=0, sack.enable=0, and blackhole=0.
Offloading: Disabled rxcsum, txcsum, tso, and lro on both ixl and igb interfaces.
Kernel Checks: Cleared hostcache and verified rfc1122_strong_es=0.
ARP Issue: Initially, the server would not automatically ARP for the client; a manual arp -s on the server was required to get the SYN-ACK onto the wire. Even with static ARP and successful ping (0.013ms RTT), TCP handshakes still fail.
QoS: Tested with ssh -o IPQoS=none to rule out IP_TOS 0x48 (AF21) drops by network hardware.
It appears the ixl driver or the 14.1 kernel is rejecting these inbound SYN-ACKs despite them being visible in promiscuous mode. Is this a known regression in the iflib based ixl driver or a specific conflict with non-standard MTUs on the X710?
Hardware & Contextual Note:
Identical Hardware: The local "problem" client and the target server are identical motherboards purchased at the same time.
Off-site Success: An identical production server (same motherboard/NIC/FreeBSD 14.1) is located off-site. The old local igb server can connect to the off-site server without issue over the Starlink and other routers in the path.
The Discrepancy: The ixl(4) client successfully handles TCP handshakes when the Destination MAC is the Starlink gateway (routed traffic), but it "ghosts" the SYN-ACK when the Destination MAC is a local peer on the same switch (L2 traffic).
ARP Behavior: The target server fails to automatically populate its ARP table for the client. Even after a manual arp -s is added and ping succeeds, the ixl driver appears to drop the inbound SYN-ACK before it reaches the TCP stack.
The last thing suggest to try (I've not done this yet) was:
In the bios:
This is primarily a firmware-level setting. Check your BIOS/UEFI settings under the NIC configuration for a "Hardware LLDP" or "Firmware LLDP" toggle.
Disabling LLDP in Supermicro BIOS
To disable the hardware agent, you must access the UEFI Device Settings during the boot process:
Enter BIOS: Press the <Del> or <F2> key during system boot.
Navigate to Advanced: Go to the Advanced tab in the BIOS menu.
Device Settings: Look for PCIe/PCI/PnP Configuration or a direct Intel(R) Ethernet Connection X710 entry under the Advanced tab.
NIC Configuration: Select the specific ixl port (e.g., NIC Configuration).
LLDP Agent: Find the setting labeled LLDP Agent and set it to Disabled.
I'm in over my head on this one. I'm hoping the above will lead to someone recognizing the problem and they can advise what to try next. The posts I found were about 2 years old, hopefully this is known issue. Once I can get the machine back on line (it can't even do a package update), I plan to upgrade to 14.3. Once that works, I'll update the Hosted Server to 14.3 too.
TIA
Last edited: