Intel X710 Ethernet Issues- TCP connections hang up

eepete · Mar 7, 2026

I've got two servers that are twins. They are SuperMicro mother board SYS-1019-WTR, Xeon Gold 6246R 16 core processor, 6 memory lanes, 96GB, triple 4GB MLC2 Sata. Purchased 2023. Ethernet configured as: ifconfig_ixl0="inet 192.168.xxx.xxx netmask 255.255.0.0 mtu 1468"
I also have a much older SuperMicro, the Ethernet on that is a 'igb0' flavor; ifconfig_igb0="inet 192.168.xxx.xxx netmask 255.255.0.0 mtu 1468"
The new servers arefreebsd-version 14.1-RELEASE-p5, I was able to upgrade the old server to 14.3-RELEASE-p7

When the two new servers were on the same LAN getting their software developed, and the MTU was 1500, they could ssh/scp files between them and the old 'igb0' server.
One server is now out in the real world on a fiber connect. This is the "Hosted Server". The other is still on my local LAN. This is the digital twin., The old 'igb0' sever is also on the LAN.
The smaller MTU is because that is what it took to make the LAN able to use StarLink with it's CGNAT set-up, and that is what it took to make ssh connections out to the Hosted Server and keep the SSH connection up over longer periods of time (1-5 days). In the AI chat, the possiblity that the 710 Ethernet hardware might be unhappy with the 1468 MTU. I suspect that the old server can connect to the Host Server out in the real world because it has to go through a router, to StarLink, then another router (albeit also CGNAT for the fixed IP address), and then a final router for port forwarding that has the Hosted Sever on it. The router could be compensating for the 1468 MTU from the old server, which pointed to a problem with the new Digital Twin server somehow.

The "Old Server" can ssh/scp out to the hosted server no problem. But the Digital Twin can not ssh/scp to either the old server or the hosted server. It just hangs. I had a four hour Google AI session working on the problem. While AI has never solve the problem, it often gets close and makes me aware of what to look at. I had the AI system create a summary of our discussion, which I enclose:

Subject: FreeBSD 14.1: ixl(4) Client Sees SYN-ACK in tcpdump but Handshake Times Out (SYN_SENT)
Environment:
Client: FreeBSD 14.1-RELEASE, Intel X710-series NIC using ixl(4) driver.
Target Server: FreeBSD 14.1-RELEASE, Intel i210-series NIC using igb(4) driver.
Network: Same Layer 2 LAN/Switch.
Tuning: Both hosts use MTU 1468 (MSS 1428) due to upstream Starlink requirements.
Comparison: Windows 11 on the same LAN connects to the target port without any issues.
The Problem:
Outbound SSH or nc from the ixl client to the igb server hangs indefinitely. Verbose SSH logs show a hang immediately after Connecting to.... On the client, the socket remains in SYN_SENT.
Observed in tcpdump:
Client sends SYN .
Target Server receives SYN and responds with SYN-ACK [S.].
Client-side tcpdump physically shows the SYN-ACK arriving at the ixl0 interface.
The OS ignores the packet. The client never sends the final ACK, and instead re-transmits the SYN.
Troubleshooting Steps Taken (No success):
Firewalls: Disabled PF on both ends. IPFW is not loaded.
TCP Extensions: Set net.inet.tcp.rfc1323=0, sack.enable=0, and blackhole=0.
Offloading: Disabled rxcsum, txcsum, tso, and lro on both ixl and igb interfaces.
Kernel Checks: Cleared hostcache and verified rfc1122_strong_es=0.
ARP Issue: Initially, the server would not automatically ARP for the client; a manual arp -s on the server was required to get the SYN-ACK onto the wire. Even with static ARP and successful ping (0.013ms RTT), TCP handshakes still fail.
QoS: Tested with ssh -o IPQoS=none to rule out IP_TOS 0x48 (AF21) drops by network hardware.
It appears the ixl driver or the 14.1 kernel is rejecting these inbound SYN-ACKs despite them being visible in promiscuous mode. Is this a known regression in the iflib based ixl driver or a specific conflict with non-standard MTUs on the X710?

Hardware & Contextual Note:
Identical Hardware: The local "problem" client and the target server are identical motherboards purchased at the same time.
Off-site Success: An identical production server (same motherboard/NIC/FreeBSD 14.1) is located off-site. The old local igb server can connect to the off-site server without issue over the Starlink and other routers in the path.
The Discrepancy: The ixl(4) client successfully handles TCP handshakes when the Destination MAC is the Starlink gateway (routed traffic), but it "ghosts" the SYN-ACK when the Destination MAC is a local peer on the same switch (L2 traffic).
ARP Behavior: The target server fails to automatically populate its ARP table for the client. Even after a manual arp -s is added and ping succeeds, the ixl driver appears to drop the inbound SYN-ACK before it reaches the TCP stack.

The last thing suggest to try (I've not done this yet) was:
In the bios:
This is primarily a firmware-level setting. Check your BIOS/UEFI settings under the NIC configuration for a "Hardware LLDP" or "Firmware LLDP" toggle.
Disabling LLDP in Supermicro BIOS
To disable the hardware agent, you must access the UEFI Device Settings during the boot process:
Enter BIOS: Press the <Del> or <F2> key during system boot.
Navigate to Advanced: Go to the Advanced tab in the BIOS menu.
Device Settings: Look for PCIe/PCI/PnP Configuration or a direct Intel(R) Ethernet Connection X710 entry under the Advanced tab.
NIC Configuration: Select the specific ixl port (e.g., NIC Configuration).
LLDP Agent: Find the setting labeled LLDP Agent and set it to Disabled.

I'm in over my head on this one. I'm hoping the above will lead to someone recognizing the problem and they can advise what to try next. The posts I found were about 2 years old, hopefully this is known issue. Once I can get the machine back on line (it can't even do a package update), I plan to upgrade to 14.3. Once that works, I'll update the Hosted Server to 14.3 too.
TIA

eepete · Mar 11, 2026

An update: (more work with Google AI, so hopefully right direction even if there are syntax errors)
Looking in dmesg, I see: fw 4.1.59148 api 1.9 nvm 4.11 etid 80001db8 oem 1.265.0
Looking on line, it seems that 'api 1.9' is a very old firmware level. FreeBsd requires API 1.15 or higher.

There is a tool "Intel NVM Update Tool for FreeBSD" that can be downloaded. Regreably, this problem keeps me from being able to do port installs. But I could download it on my working "old" igb server and scp over from the old to the "Digital twin".

Does this sound like a reasonable plan ?

atax1a · Mar 11, 2026

if you go mucking around with the MTU, you have to also use pf to clamp the MSS of any routed/NAT'd traffic. otherwise, TCP handshakes leave your network with an MTU that the underlying path cannot support, and this breaks random stuff in exactly the way you're describing.

my condolences on trusting the slop bot.

eepete · Mar 11, 2026

when the issue started (which was when the network switch from a DSL to bypassed StarLink router to 3rd party router fror the LAN), the MTU was 1500. The ssh and scp that used to work failed, even to the local machine where it used to work.
Various StarLink theads mentioned lowering the MTU due to the CGNAT nature of StarLink.

Would it make sense to set the MTU to 1500 and continue debugging, since changing it can cause other problems?

atax1a · Mar 11, 2026

yeah. you'll also have to get pretty familiar with tcpdump. we recommend checking out The Book Of PF, from NoStarch.

eepete · Mar 11, 2026

OK will put MTU back to 1500. The Google AI had me do some things with TCP dump that showed that the ACK state of the initial handshake was being dropped. That lead to "Is the OS relying on the hardware". But the options of 'VLAN_MTU,JUMBO_MTU,HWSTATS,MEXTPG' would lead one to believe that the hardware was not being used to compute checksums.
I can't do any port loads, but there is an Intel update program I can put on a USB stick and run to update the firmware.
I think I'm gong to dig into using tcpdump to see each state change during the TCP connection, and if the "missing ack" is true then go the firmware update route.

Making this even crazier is that this box used to work on the DSL line, and the production server is exactly the same ixl0 api 1.9 as this server and it works great on the fiber drop with a fixed IP and a 3rd party router for port forwarding. From that perspective, the only thing that changed on the server on my LAN is it was connected to StarLink in bypass mode and through a 3rd party router. The "old" sever has no problem to ssh and scp to the hosted router.

I was not aware that pf was sensitive to MTUs. Right now, PF is off on the LAN system.

More to look at, tnx for the comments. A lot of learning left for me to do.

atax1a · Mar 11, 2026

when you've got multiple routers going on, you have to be able to compare the packets at each hop ideally, or at least at the source end and target end. you also have to be careful to allow the right ICMP through, otherwise path MTU discovery breaks. again, the book of pf covers this quite well, whereas the chatbot will mislead you in subtle ways. there's no substitute for thinking through and understanding things on your own.

eepete · Mar 11, 2026

The book (will be the 4th edition, ships out March 17) is ordered.
Everything I'm working on is contingent on running with StarLink it areas with no other communications available. I also need to setup a VPN so the outside world can get back into the deployed system. And was thinking Open Sense so the deployed resource can use LTE Data, A hardwire ethernet with HDCP address, and StarLink as the final backup.

In the interim, I'll update the firmware on the Ethernet chip on the motherboard. While this is a bigger server, deployed severs will be 8 core atoms. they both will run the exact same FreeBSD OS and other software.

eepete · Mar 11, 2026

FWIW: Here's a tcpdump showing what goes on when I try to ssh into the remote machine:
21:45:11.885981 IP 192.168.xxx.xxx.21847 > 208.160.xxx.xxx.1019: Flags , seq 3059303938, win 65535, options [mss 1460], length 0
21:45:11.927037 IP 208.160.xxx.xxx.1019 > 192.168.xxx.xxx21847: Flags [S.], seq 3318167620, ack 3059303939, win 65535, options [mss 1428], length 0
21:45:12.903438 IP 192.168.xxx.xxx.21847 > 208.160.xxx.xxx.1019: Flags , seq 3059303938, win 65535, options [mss 1460], length 0
21:45:12.939881 IP 208.160.xxx.xxx.1019 > 192.168.xxx.xxx21847: Flags [S.], seq 3318167620, ack 3059303939, win 65535, options [mss 1428], length 0

I never get the see the final ACK Whill change MTU back to 1500 again, but, last time I did this it made no difference.
Indeed, as atax 1A suggest, pf is now the main culprit to examine/lean about.

Alain De Vos · Mar 11, 2026

Code:

ping -D -s 1472 destination_ip
ping -D -s 1500 destination_ip
ping -D -s 8972 destination_ip
route -n get destination_ip

For me only -s 1400 works, no idea why this means mtu 1448.

eepete · Mar 11, 2026

I think I'm dealing with 3 separate problems. Have 3 instead a single problem certainly explains why this things are so difficult.
1) The X710 Intel Ethernet driver, which seems at time to be flakey based on searches all over the net. My firmware level is 'api 1.9' and the current version is 1.15. So there is a firmware update in my future. This was hampered by the fact that the server can't even do a pkg update.

2) The 'Starlink' factor: The server on my LAN worked great until I had to switch over to Starlink. ssh and scp both broke. Other machines on the LAN (both PC windows 11 and other FreeBSD systems) could not keep a ssh session up for more than an hour or two. That requires changes in the ssh configuration. Starlink uses CGNAT so a number of sites recommended lowering the MTU. On my Windows machine, I had to take the size of a ping down to 1420 when I set the "No Fragmentation" option. There is no way to come in from the outside Internet to your LAN without paying for a 3-4X more expensive Starlink IP V6 fixed IP. The DDNS solutions (No-Ip, etc) don't work. Some have made a tunnel to a server on the internet work, so I'll be going down that path and that means that headers get bigger. Like many rural areas, all ISP providers are IPV4 only, they do not support IPV6 so the IPV6 fixed IP is not a solution. I'm already having to turn off IPV6 to make things work.
As if often the case, if you line up all the solutions on line end to end, they don't point anywhere. The bottom line is changes need to be made in system configurations to work over Starlink due to CGNAT, headers, VPN impact and even TCP drivers because the Starlink system changes satellites every 15 seconds or so and this constant change in latency seems to upset some drivers. I've got a lot to learn about a lot of things due to the changes in connectivity when working via satellite vs. terrestrial. I think we all have a bit to learn.

3) Late lastnight, I did get the FreeBSD server on my LAN working. I had change my pf.conf, here was the final line that worked:
pass out quick on ixl0 proto { tcp, udp } to any modulate state
I'm still learning about all this, but, it seems that the ACK was being dumped by either the X710 or FreeBSD. The "modulate state" was telling the system to "losen up on the strictness" of processing packets. When the book on PF shows up, I hope to understand this better. Now I can at least update packages and more importantly load the pkg to download the system driver changes that work with the X710.

Once the firmware and system software is updated, then I can worry about the optimal MTU. When I bring up the VPN tunnel so I can come into my LAN from a server on the net, that may need to be changed.

I know there is a lot of "AI Sucks" view out there, and I'm right with you. But multiple 3 hour "chats" with the google AI has been very helpful. I view AI like any other tool (MSO oscilloscope, signal generator, VNA, spectrum analyzer) or software diagnostic tools. The tools are dumb, and there is some operator skill needed. The AI was poor at keeping in mind I was on FreeBSD 14.1, and kept providing "Type this" solutions that were wrong because having been trained on Linux, windows, and lots of different FreeBSD versions, the AI behaved like a dunk expert. I had to continuously remind it I was on FreeBSD 14.1. If I could not have recognized when it was making a mistake, it would have been useless. AI would have me use a spectrum analyzer to measure a voltage. The value of AI was pointing out that I needed to measure a voltage which got me looking in new directions.

So my journey continues. I'll work on and juggle these three issues and learn what I need to to fix them. The proliferation of "Internet over Satellite" is creating new challenges in system configuration. For everything I'm doing (digital communications and servers for first responders deployed into areas with absolutely no communications at all) this has to be figured out. Getting networking working, and adding some of the custom hardware needed for deployable servers in vehicles will be a big win for everyone. The fire agency I work with had a team deployed into an area working on downed trees with chainsaws that had zero comms for 14 hours at a time. This is not safe and not acceptable.

I'll post when I figure this all out. I thank everyone for their input.

eepete · Saturday at 1:07 AM

I finally have things working, so for those who have suffered through this so far, here is what was learned and done. If others are having issues with StarLink, this might be of help to them:

1) Concerns about the Intel X700 series Ethernet controller:
Since the chip with this is on the motherboard (vs. a PCI ethernet card) the price of failure trying to update the motherboard flash seemed high. An
ifconfig ixl0
showed a lot of options. I turned off the hardware checksuming so I just didn't have to worry about any issues:
ifconfig_ixl0="inet 192.168.xxx.xxx netmask 255.255.0.0 mtu 1458 -rxcsum -txcsum -rxcsum6 -txcsum6 -tso -lro"

2) MTU concerns with StarLink:
If you link the suggestions on the net end to end, they really don't point anywhere. What I did do is adjust the MTU so that a ping with forced "No Fragmentation" worked (it got a reply). On my setup, the MTU ended up at 1458. While I wish the "why" was clearer, this method of "turn down the MTU until it works" is OK by me. It is quite possible that while the maximum packet size is smaller, sending with no fragmentation is a win.

3) The problem that when establishing a TCP connection, the server new saw the ACK when trying to ssh or scp:
At the top of the rules, I put:
scrub on ixl0 all reassemble tcp max-mss 1418

Then I made this change:. Early on in the rules, just after my 'pass in quick' for my known sites and whitelist, I added this rule:
pass out quick on ixl0 proto { tcp, udp } to any modulate state

And then everything started working. And now everything works as it did before I switched my local LAN over to a bypassed StarLink dish with off the shelf 3rd party router (a Netgear AX4200).

the "Modulate State" suggestion came near the end of a 4 hours AI session. As expected, the AI was rarely completely right, but often useful in point me towards other directions to investigate.
My copy of "The Book of PF" 4th edition showed up a few days ago. I'll be reading to learn more about all this. I wish the type was bigger, it's a bit difficult reading it. I'm a bit uncomfortable having "modulate state" making everything work without really understanding it all.

atax1a · Saturday at 1:20 AM

scrub reassemble tcp caused us weird problems because of some of the timestamp checks it makes break on modern load-balanced sites, fwiw.

things only working when you modulate state is really weird. can you post your entire config?

eepete · Saturday at 1:27 AM

Heres the key parts, everything after this is passing and blocking IP ranges.

####################
# PACKET FILTERING #
####################
set skip on lo0

# Due to starlink, set the MSS Clamping
scrub on ixl0 all reassemble tcp max-mss 1418

pass in quick on $INET from <mysites> to any
pass in quick on $INET from <my_whitelist> to any
pass out quick on $INET from any to <my_whitelist> modulate state

# rule to allow modulate state on all outgoing so I can do pkg and updates
pass out quick on ixl0 proto { tcp, udp } to any modulate state

# then a bunch of this sort of things (there is <snip> here, but you get the picture
pass in quick on $INET from <validemails> to any
pass in quick on $INET from <responder_whitelist> to any

# allow pings, moot at this point because "the outside can't reach in"
icmp_types = "{ echoreq, unreach, timex }"
pass quick on $INET proto icmp all

# and then more pass in quick and block in quick, although since you can't reach into a StarLink LAN from the outside they are kinda moot

atax1a · Saturday at 1:36 AM

IMO you want those ICMP rules up towards the top if you're going to be using quick. You also don't actually use the $icmp_type variable, the rule should be closer to pass quick on $INET proto icmp icmp-type $icmp_types.... We'd also take out the reassemble tcp but keep the max-mss.

eepete · Saturday at 1:37 AM

I guess I thought that you had to set up all the macros/IP lists before the rules. This is not the case?

atax1a · Saturday at 1:40 AM

you can declare variables (foo="blatz") anywhere, it's the table and scrub directives that that need to be in a specific order relative to the rules. but also the way you use quick means that you evaluate the ICMP rules kinda late, which could be interfering. you also aren't actually including the value of icmp_types in the rule, meaning you're passing more than you expect.

eepete · Saturday at 1:46 AM

Atax1a: I'm going to focus on those config values in the book and when I understand them better kick them in.
it's odd how because of the way StarLink works, nothing can come in unless it's a response to something that went out (NAT-ing). But as some point in the future I hope to get some tunneling out to my hosted server and all that will change.
I wonder if using Port forwarding on an off the shelf router is just a quick way to manage the network vs. learning how to do that native on the machine. Note that "quick" is rarely good or best...

atax1a · Saturday at 2:09 AM

eepete said:
"quick" is rarely good or best...

to be clear, we were talking specifically about the pf keyword

but we also do approve of engaging with / reading the material. Good luck.

Alain De Vos · Saturday at 2:12 AM

All operators are different, me i had to do,

Code:

x@myfreebsd:/etc $ cat rc.conf | grep -i mtu
ifconfig_em0="DHCP mtu 1448"
ifconfig_em0_ipv6="inet6 accept_rtadv mtu 1448"
x@myfreebsd:/etc $ cat sysctl.conf | grep -i mtu
# Enable Path MTU Discovery (default is 1)
net.inet.tcp.path_mtu_discovery=1
net.inet.tcp.pmtud_blackhole_detection=1
net.inet.tcp.pmtud_blackhole_mss=1536