Solved Intel 82599ES only primary port working, no TX on second port

I finally found the time today to take a look at one of our servers with a dual 10Gbit uplink via an Intel 82599ES dual-port NIC where one of the redundant links went down a few days ago..

first port is working as intended:
Code:
# ifconfig -v ix0
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:1b:21:8b:f8:2c
        inet 10.10.2.101/24 broadcast 10.10.2.255
        media: Ethernet autoselect (10Gbase-SR <full-duplex,rxpause,txpause>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        plugged: SFP/SFP+/SFP28 10G Base-SR (LC)
        vendor: Intel Corp PN: FTLX8571D3BCV-IT SN: ALH1AV9 DATE: 2011-10-27
        module temperature: 51.27 C Voltage: 3.31 Volts
        RX: 0.49 mW (-3.02 dBm) TX: 0.66 mW (-1.74 dBm)

second port won't send, even with another transceiver (FS.COM) and cable, both brand new.
Code:
# ifconfig -v ix1
ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:1b:21:8b:f8:2d
        media: Ethernet autoselect
        status: no carrier
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        plugged: SFP/SFP+/SFP28 10G Base-SR (LC)
        vendor: AVAGO PN: AFBR-703SMZ-NA3 SN: AD1412A004L DATE: 2014-04-22
        module temperature: 46.96 C Voltage: 3.34 Volts
        RX: 0.52 mW (-2.82 dBm) TX: 0.00 mW (-40.00 dBm)

The switch accordingly reports the port as "down" and nothing coming in on the RX side:
Code:
#sh int Te1/1/1 tra det 
ITU Channel not available (Wavelength not available),
Transceiver is internally calibrated.
mA: milliamperes, dBm: decibels (milliwatts), NA or N/A: not applicable.
++ : high alarm, +  : high warning, -  : low warning, -- : low alarm.
A2D readouts (if they differ), are reported in parentheses.
The threshold values are calibrated.

                              High Alarm  High Warn  Low Warn   Low Alarm
          Temperature         Threshold   Threshold  Threshold  Threshold
Port       (Celsius)          (Celsius)   (Celsius)  (Celsius)  (Celsius)
--------- ------------------  ----------  ---------  ---------  ---------
Te1/1/1     39.9                80.0        70.0         0.0      -10.0

                              High Alarm  High Warn  Low Warn   Low Alarm
           Voltage            Threshold   Threshold  Threshold  Threshold
Port       (Volts)            (Volts)     (Volts)    (Volts)    (Volts)
---------  ---------------    ----------  ---------  ---------  ---------
Te1/1/1    3.31                  3.63        3.46        3.13       2.97

           Optical            High Alarm  High Warn  Low Warn   Low Alarm
           Transmit Power     Threshold   Threshold  Threshold  Threshold
Port       (dBm)              (dBm)       (dBm)      (dBm)      (dBm)
---------  -----------------  ----------  ---------  ---------  ---------
Te1/1/1     -1.4                 2.0         0.0        -7.5       -9.5

           Optical            High Alarm  High Warn  Low Warn   Low Alarm
           Receive Power      Threshold   Threshold  Threshold  Threshold
Port       (dBm)              (dBm)       (dBm)      (dBm)      (dBm)
-------    -----------------  ----------  ---------  ---------  ---------
Te1/1/1    -40.0                 2.0         0.0       -12.1      -14.1

Both transceivers (the one installed and the new one I swapped in for testing) are working when plugged into the switch, but I don't even get any optical signal (i.e. "it's not glowing red on the left side") from both transceivers when plugged into the NIC...
I've never actually seen a broken SFP(+) port - might this be the first one for me or is there anything accessible from the software side I can check/try? I didn't found any hints on settings/commands in the ifconfig and ix manpages...
 
It looks like the hardware right after the port is ok, even when the os is not booted the lights should glow. Look deeper on the nic
 
That's exactly what I want to do, but don't know if there's any tool or ifconfig subcommand available for that...
no, i mean the nic is damaged, all 8 pins are usually visible on card. use a spare ethernet cord and test continuity! :)
 
no, i mean the nic is damaged, all 8 pins are usually visible on card. use a spare ethernet cord and test continuity! :)

I have no SFP+ copper modules for such a test and as this is a production server I won't fiddle around with that old NIC a single minute if it's physically broken. In that case it goes straight to the bin.
I already ordered a new X710-DA2 NIC just in case; so if there's nothing more from the software side I can check/try I'll use the next maintenance window to replace it...
 
After replacing the NIC with a new X710-DA2 I've found some time to take a look at that NIC again...

The NIC had 2 different transceivers installed: intel and avago; the port that didn't work was the one with the avago transceiver; this stays consistent when swapping them around, so the NIC doesn't like the avago transciever any more... I know that my X520-da2 at home also came with avagos and still works like a charm. Also when I replace the avago transceiver with another avago from yet another intel X520 card where that transceiver works, the port on that card stays dead.

I received 4 new FS.COM intel compatible transceivers today and tried them in that NIC and both ports work like a charm.
The thing that completely puzzles me: the NIC worked for ~2 years in that system with those transceivers. I have no idea if there was some driver update that updated the firmware and triggered this behaviour or why it suddenly doesn't like avagos that work in other X520 NICs...

TL;DR: the card stopped working with avago transceivers; works with genuine intel or intel-compatible FS.COM transceivers.
 
Back
Top