Intel I219-V ADL(16) unusable with MTU <5360

A few days ago I received a new Intel 12th gen Thinkpad with an Intel I219-V ADL(16) NIC:
Code:
# grep em0 /var/run/dmesg.boot 
em0: <Intel(R) I219-V ADL(16)> mem 0xbc300000-0xbc31ffff at device 31.6 on pci0
em0: EEPROM V0.5-4
em0: Using 1024 TX descriptors and 1024 RX descriptors
em0: Using an MSI interrupt
em0: Ethernet address: e8:80:88:5b:44:44
em0: netmap queues/slots: TX 1/1024, RX 1/1024
em0: link state changed to UP
# pciconf -l | grep em0
em0@pci0:0:31:6:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1a1f subvendor=0x17aa subdevice=0x22e8

The NIC is recognized OOTB with 13.1-RELEASE as well as 13.2-RC1 (which is currently installed), however, it only works reliable (or at all) with MTU set at 5360 or higher. Anything lower and performance is absolutely horrible (e.g. scp few kb/sec) with constant connection stalls/drops and the interface going dark for a few seconds every now and then. Ping times fluctuate heavily between ~1ms to 20+ms with occasional drops to >1000s and going dark for several seconds if I try to put some load (e.g. file transfer via scp) on the interface. dmesg never shows any errors and no link state up/down when the interface goes silent.
As soon as I set MTU=5360 or higher, ping runs stable with ~1ms ping times to/from my workstation on the same switch and transfer speeds with scp are above 100MB/s and stable.

I already tried disabling LRO and checksum offloading, which doesn't affect this behaviour in any way.

I have no issues using jumbo frame MTUs as a workaround b/c MTU discovery seems to work perfectly fine and on switches or to/from hosts with an MTU of 1500 connections so far have been reliable stable; but a properly working driver would be more convenient and correct IMHO. (also using different MTUs within an infrastructure always bites back at some time...)

I'm currently running 13.2-RC1 but I've also observed those connection problems with ssh being very laggy and dropping connections every 1-2mins with 13.1-RELEASE which I installed first. RC isn't supported here, so I can switch back to 13.1-RELEASE for testing if required.
 
I don't see "em" driver to support i219 anyway recently i read that under "windows" they fix an similar issue which cause disconnects because of the EEE (Energy Efficient Ethernet) in the driver. Can you try to disable it via hw.em.eee_setting and other power saving features if any (hw.em.smart_pwr_down).
 
What if you try "-tso4 -tso6 -lro -vlanhwtso" ?
Does the driver in ports work better? https://www.freshports.org/net/intel-em-kmod/

Those (except LRO) aren't enabled by default:
Code:
# ifconfig em0
em0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 5360
        options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>
[...]

issueing "-tso4 -tso6 -lro -vlanhwtso" anyways doesn't give any errors but also won't fix the problem.

The driver from ports/packages strangely won't recognize the NIC at all - with 'if_em_udated_load="YES"' in /boot/loader.conf the only hint about the NIC in dmesg is "pci0: <network> at device 20.3 (no driver attached)".
 
I don't see "em" driver to support i219 anyway recently i read that under "windows" they fix an similar issue which cause disconnects because of the EEE (Energy Efficient Ethernet) in the driver. Can you try to disable it via hw.em.eee_setting ?

I also wondered why the driver picks up that card; but I assumed intel now also uses some arbitrary marketing names (like I219) instead of the real chipset identifier (the 5-digit numbers usually starting with 82).

Disabling eee also won't solve the issue - moving back to MTU 1500 immediately brings back the erroneous behaviour.
 
can you test if the problem exists in -CURRENT too ?
if it doesn't it will be easier to fix

Just tried the latest 14.0-CURRENT memstick image and transferred a 1GB file via scp with no issues. Ping times are also consistently at <1ms while with 13.x I always see ~1-2ms when the interface is "working" (i.e. MTU >5659).

So the driver from 14.0-CURRENT is working correctly (despite also not mentioning I219 as supported in the manpage).
 
looks there are some specifc diffs for I219 "to avoid hangs" -CURRENT code
does not seem very hard to backport (at a first look)
Diff:
+       /* I219 needs some special flushing to avoid hangs */
+       if (sc->hw.mac.type >= e1000_pch_spt && sc->hw.mac.type < igb_mac_min)
+               em_flush_desc_rings(sc);
+
there are 3 flush functions in total
 
Back
Top