LACP stops working after upgrade to 14.3-RELEASE

I have a server that was running 14.2-RELEASE-p1 happily for over a year. It has 2 x 1GB links to a stack of Juniper EX4100 switches, using LACP for aggregation.
This was working fine until I upgraded the server to 14.3-RELEASE using `freebsd-update -r 14.3-RELEASE upgrade install`

Now the interfaces come up but nothing is passed.

I have another server, with a similiar configuration, that's still on 14.2-RELEASE, and one difference I see is in ifconfig output:

On old server:
Code:
laggport: bnxt0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: bnxt1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

On new server:
Code:
laggport: bnxt0 flags=18<COLLECTING,DISTRIBUTING>
laggport: bnxt1 flags=18<COLLECTING,DISTRIBUTING>

I'm not sure if the missing ACTIVE flag is just because the lagg fails to activate the interfaces, or if it's configured in passive mode. I haven't changed anything in rc.conf or on the switch.

The switch shows lacp active and both members active:

Code:
Aggregated interface: ae11
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      ge-0/0/7       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      ge-0/0/7     Partner    No    No   Yes  Yes  Yes   Yes     Slow    Active
      ge-1/0/7       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      ge-1/0/7     Partner    No    No   Yes  Yes  Yes   Yes     Slow    Active
    LACP protocol:        Receive State  Transmit State          Mux State
      ge-0/0/7                  Current   Slow periodic Collecting distributing
      ge-1/0/7                  Current   Slow periodic Collecting distributing

I tried disabling lacp and using the interfaces seperate, and it works, so it's not the interfaces themselves or cabling.
Could it be changes in flags for bnxt driver needed?
 
Could you please provide the full ifconfig output? I have pair of these NICs collecting dust, but sadly don't have a lacp-capable switch, so all I get is:
Code:
        laggport: bnxt0 flags=0<>
        laggport: bnxt1 flags=0<>
I wonder if fixing the media reporting would have any effect and will look into that.
 
OK, the media part seems to be easy (and weird, this somehow reminds of that xlibre issue discussed in other thread), if you are up to patching the kernel, please try the following:
Code:
diff --git a/sys/dev/bnxt/bnxt_en/if_bnxt.c b/sys/dev/bnxt/bnxt_en/if_bnxt.c
index 0e5bb6a736a..5007df4110b 100644
--- a/sys/dev/bnxt/bnxt_en/if_bnxt.c
+++ b/sys/dev/bnxt/bnxt_en/if_bnxt.c
@@ -4618,15 +4618,15 @@ bnxt_add_media_types(struct bnxt_softc *softc)
        case HWRM_PORT_PHY_QCFG_OUTPUT_PHY_TYPE_BASET:
        case HWRM_PORT_PHY_QCFG_OUTPUT_PHY_TYPE_BASETE:
                media_type = BNXT_MEDIA_BASET;
-               return;
+               break;

        case HWRM_PORT_PHY_QCFG_OUTPUT_PHY_TYPE_BASEKX:
                media_type = BNXT_MEDIA_BASEKX;
-               return;
+               break;

        case HWRM_PORT_PHY_QCFG_OUTPUT_PHY_TYPE_SGMIIEXTPHY:
                media_type = BNXT_MEDIA_BASESGMII;
-               return;
+               break;

        case HWRM_PORT_PHY_QCFG_OUTPUT_PHY_TYPE_UNKNOWN:
                /* Only Autoneg is supported for TYPE_UNKNOWN */
 
Pretty confident about this patch, and indeed silly bug ... media_type is a local variable there, so setting it is a no-op when immediately returning. The returns should just be there for the two "error cases" below.

Would be interesting to know whether this is also the root cause for the LACP issue. I really need LACP and I have different NICs, so if they aren't affected, I could upgrade. ;)
 
Would be interesting to know whether this is also the root cause for the LACP issue.
The reply in PR shows that it helps (I have no idea why though, didn't look too deep to find why lagg or even something lower cares about media type).
 
I tried the patch and indeed it solves the lacp problem. By accident I tried it on HEAD kernel first, then when I discovered what I did, I tried it against releng/14.3 and the patch applied to both and lacp worked on both with the patch
 
I tried the patch and indeed it solves the lacp problem. By accident I tried it on HEAD kernel first, then when I discovered what I did, I tried it against releng/14.3 and the patch applied to both and lacp worked on both with the patch
DEar einsibjani :
i am new guy, can you show me the step by step . how to patch ? thanks.
 
If you've upgraded to 14.3-RELEASE like me, the process I did was:

1) If /usr/src is empty, `cd /usr/src && git clone --single-branch --branch releng/14.3 https://git.freebsd.org/src.git ./`
2) cp /usr/src/sys/amd64/conf/GENERIC /usr/src/sys/amd64/conf/BNXT-FIX
3) cd /usr/src && make buildkernel KERNCONF=BNXT-FIX
4) make installkernel KERNCONF=BNXT-FIX
 
Just compiled 14.3 stable with todays source pull from Github, no issues

# uname -a
FreeBSD XXXXXX 14.3-STABLE FreeBSD 14.3-STABLE stable/14-n271682-4027e17c1795

#ifconfig -a

lagg0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4e427bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
ether
hwaddr 00:00:00:00:00:00
laggproto lacp lagghash l2,l3,l4
laggport: igc0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igc1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ext0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4600703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG>
ether
inet xxx.xxx.xxx.xxx netmask 0xffffff00 broadcast xxx.xxx.xxx.255
groups: vlan
vlan: 3 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
int0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4600703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG>
ether c8:7f:54:5a:f3:a5
inet xxx.xxx.xxx.xxx netmask 0xffffff00 broadcast xxx.xxx.xxx.255
groups: vlan
vlan: 2 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
 
Just compiled 14.3 stable with todays source pull from Github, no issues

# uname -a
FreeBSD XXXXXX 14.3-STABLE FreeBSD 14.3-STABLE stable/14-n271682-4027e17c1795

#ifconfig -a

lagg0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4e427bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
ether
hwaddr 00:00:00:00:00:00
laggproto lacp lagghash l2,l3,l4
laggport: igc0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igc1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ext0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4600703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG>
ether
inet xxx.xxx.xxx.xxx netmask 0xffffff00 broadcast xxx.xxx.xxx.255
groups: vlan
vlan: 3 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
int0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4600703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG>
ether c8:7f:54:5a:f3:a5
inet xxx.xxx.xxx.xxx netmask 0xffffff00 broadcast xxx.xxx.xxx.255
groups: vlan
vlan: 2 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
The issue/patch is only relevant to bnxt(4) interfaces, and only with copper media, it seems.
 
If you've upgraded to 14.3-RELEASE like me, the process I did was:

1) If /usr/src is empty, `cd /usr/src && git clone --single-branch --branch releng/14.3 https://git.freebsd.org/src.git ./`
2) cp /usr/src/sys/amd64/conf/GENERIC /usr/src/sys/amd64/conf/BNXT-FIX
3) cd /usr/src && make buildkernel KERNCONF=BNXT-FIX
4) make installkernel KERNCONF=BNXT-FIX
Dear einsibjani :
thanks for your help . study it.
other queston : how do you know /usr/src/sys/amd64/conf/GENERIC has a patch about your lacp issue ?
why we can't use "git clone https://git.freebsd.org/src.git" to rebuild kernel ?

example :
my usbmuxd have some problem. how to know the usbmuxd publisher push a patch in some where?

thanks.
 
fff2024g There's no need to create a custom kernel config if it's just a copy of GENERIC. I assume einsibjani just did it to have a clear identification in e.g. uname output. You can skip that (and the KERNCONF= arguments).

What's missing in these instructions is the actual patching, e.g. like this:
Code:
cd /usr/src && fetch -o- 'https://bz-attachments.freebsd.org/attachment.cgi?id=261245' | patch -Np1
 
Back
Top