BCM57711 on FreeBSD 9.1 RC1

zorlack · Sep 14, 2012

I have FreeBSD 9.1 RC1 installed on a Dell C2100 Server,

I'm trying to use a Broadcom NetXtreme II 557711 (BCM57711) to provide 10GBE access to istgt.

The card has two ports which, for testing purposes, are cross-connected using a copper SFP+ cable.

On boot the hardware is displayed like this:

Code:

bxe0: <Broadcom NetXtreme II BCM57711 10GbE (A0) BXE v:1.5.52 mem 0xde800000-0xdeffffff,0xde000000-0xde7fffff irq 32 at device 0.0 on pci5
bxe0: Ethernet address: 00:10:18:ba:24:30
bxe0: ASIC (0x164F0000); Rev (A0); Bus (PCIe x8, 5Gbps); Flags (MSI-X); Queues (RSS:16); BD's (RX:510,TX:255); Firmware (5.2.13); Bootcode (6.2.15)
bxe1: <Broadcom NetXtreme II BCM57711 10GbE (A0) BXE v:1.5.52 mem 0xdd800000-0xddffffff,0xdd000000-0xdd7fffff irq 42 at device 0.1 on pci5
bxe1: Ethernet address: 00:10:18:ba:24:32
bxe1: ASIC (0x164F0000); Rev (A0); Bus (PCIe x8, 5Gbps); Flags (MSI-X); Queues (RSS:16); BD's (RX:510,TX:255); Firmware (5.2.13); Bootcode (6.2.15)

I've added the following two lines to /etc/rc.conf:

Code:

ifconfig_bxe0="inet 10.0.7.12 netmask 255.255.255.0"
ifconfig_bxe1="inet 10.0.7.13 netmask 255.255.255.0"

When I issue a /etc/rc.d/netif restart I get the following message:

Code:

bxe0: /usr/src/sys/modules/bxe/../../dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot fill fp[15] RX chain.
bxe0: /usr/src/sys/modules/bxe/../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting!
bxe1: /usr/src/sys/modules/bxe/../../dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot fill fp[15] RX chain.
bxe1: /usr/src/sys/modules/bxe/../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting!

Does anyone know what this means? I'd like to be able to send some test traffic between the two ports on this NIC before I ship the server off to our colo.

Thanks,

-Z

madman · Jul 23, 2013

BCM57711 on FreeBSD 9.1 final (lot of issues)

Hi!

We recently installed FreeBSD 9.1 64bit on a Dell PowerEdge R510 system in which we have two BCM57711 (for a total of four 10 Gbit interfaces.) Actually in test, the filer is connected with two 10 Gbps interfaces to a 10 Gbps Dell PowerConnect switch that serves some Linux clients using 10 Gbps cards too. We get into a lot of troubles trying to get something working out of this setup.

First issue:

Without any special tweaking, when we're reading or writing to the NFS server from a client, the network card crashes. In the logs I can see:

Code:

Jul 19 11:49:26 filer-01-a kernel: bxe0: ---------- Begin crash dump ----------
Jul 19 11:49:26 filer-01-a kernel: bxe0: ------------------------------ Idle Check ------------------------------
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CFC: AC > 1 - LCID 39 CID_CAM 0x7 Value is 0xc
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: VOQ_0, VOQ credit is not equal to initial credit. Values are 0xf8 0x140
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: P0 Byte credit is not equal to initial credit. Values are 0x5a1c 0x8000
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING CCM: XX protection CAM is not empty. Value is 0x1
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING XCM: XX protection CAM is not empty. Value is 0x1
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING BRB1: BRB is not empty. Value is 0x3
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING TCM: FIC0_INIT_CRD is not 64. Value is 0x30
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR TSEM: interrupt status 0 is not 0. Value is 0x10000
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CSEM: interrupt status 0 is not 0. Value is 0x10000
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR XSEM: interrupt status 0 is not 0. Value is 0x10000
Jul 19 11:49:26 filer-01-a kernel: bxe0: bxe_idle_chk(): Failed with 4 error(s) and 0 warning(s)!
Jul 19 11:49:26 filer-01-a kernel: bxe0: ------------------------------------------------------------------------
Jul 19 11:49:26 filer-01-a kernel: bxe0: ------------------------------ Idle Check ------------------------------
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CFC: AC > 1 - LCID 39 CID_CAM 0x7 Value is 0xc
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: VOQ_0, VOQ credit is not equal to initial credit. Values are 0xf8 0x140
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: P0 Byte credit is not equal to initial credit. Values are 0x5a1c 0x8000
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING CCM: XX protection CAM is not empty. Value is 0x1
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING XCM: XX protection CAM is not empty. Value is 0x1
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING BRB1: BRB is not empty. Value is 0x4
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING TCM: FIC0_INIT_CRD is not 64. Value is 0x30
Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING PRS: TCM current credit is not 0. Value is 0x10
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR TSEM: interrupt status 0 is not 0. Value is 0x10000
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CSEM: interrupt status 0 is not 0. Value is 0x10000
Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR XSEM: interrupt status 0 is not 0. Value is 0x10000
Jul 19 11:49:26 filer-01-a kernel: bxe0: bxe_idle_chk(): Failed with 4 error(s) and 0 warning(s)!
Jul 19 11:49:26 filer-01-a kernel: bxe0: ------------------------------------------------------------------------
Jul 19 11:49:26 filer-01-a kernel: bxe0: ----------  End crash dump  ----------

A reboot of the system is not even enough. After rebooting the system, I can't even ping any hosts on the network. It seems that it leaves the card in a bogus state that requires a complete power cycle to get the cards back in business.

We found out that disabling: tso4 txcsum rxcsum on the cards prevents this from happening.

So although I think it's not, let's say we have a fix for this setting in rc.conf something like this:

Code:

ifconfig_bxe0="inet 10.50.50.11 netmask 255.255.255.0 mtu 9000  -tso4 -txcsum -rxcsum"

Second issue:

Issuing an ifconfig mtu 9000 on the interfaces randomly produces this error:

Code:

Jul 19 09:47:03 filer-01-a kernel: bxe0: /usr/src/sys/dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot fill fp[04] RX chain.
Jul 19 09:47:03 filer-01-a kernel: bxe0: /usr/src/sys/dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting!
Jul 19 09:47:12 filer-01-a kernel: bxe3: /usr/src/sys/dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot fill fp[04] RX chain.
Jul 19 09:47:12 filer-01-a kernel: bxe3: /usr/src/sys/dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting!

That sounds quite bad and I can't reproduce it with a MTU 1500 setting. (but does it makes sense to use a MTU of 1500 on a 10 Gbps local network?)

Third issue,

part 1)

We've tried two interfaces (each interface with an MTU of 9000) using LAGG, like this:

Code:

ifconfig bxe0 up -tso4 -txcsum -rxcsum mtu 9000
ifconfig bxe2 up -tso4 -txcsum -rxcsum mtu 9000
ifconfig lagg0 create
ifconfig lagg0 up laggproto failover laggport bxe0 laggport bxe2 10.50.50.11/24

This instantly crashes the kernel and causes a machine reboot. The log says:

Code:

Jul 19 09:47:12 filer-01-a kernel: 
Jul 19 09:47:12 filer-01-a kernel: 
Jul 19 09:47:12 filer-01-a kernel: Fatal trap 12: page fault while in kernel mode
Jul 19 09:47:12 filer-01-a kernel: cpuid = 0; apic id = 20
Jul 19 09:47:12 filer-01-a kernel: fault virtual address        = 0x6d
Jul 19 09:47:12 filer-01-a kernel: fault code           = supervisor read data, page not present
Jul 19 09:47:12 filer-01-a kernel: instruction pointer  = 0x20:0xffffffff808d5879
Jul 19 09:47:12 filer-01-a kernel: stack pointer                = 0x28:0xffffff80003227f0
             --*** BOOOM REBOOT ***-- 
Jul 19 09:49:49 filer-01-a syslogd: kernel boot file is /boot/kernel/kernel

/var/crash/core.txt.0 returns:

Code:

Unread portion of the kernel message buffer:
Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 33
fault virtual address   = 0x6d
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff808d5879
stack pointer           = 0x28:0xffffff80003227f0
frame pointer           = 0x28:0xffffff8000322820
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi6: task queue)
trap number             = 12
panic: page fault
cpuid = 5
KDB: stack backtrace:
#0 0xffffffff809208a6 at kdb_backtrace+0x66
#1 0xffffffff808ea8be at panic+0x1ce
#2 0xffffffff80bd8240 at trap_fatal+0x290
#3 0xffffffff80bd857d at trap_pfault+0x1ed
#4 0xffffffff80bd8b9e at trap+0x3ce
#5 0xffffffff80bc315f at calltrap+0x8
#6 0xffffffff8045da8c at bxe_free_buf_rings+0x4c
#7 0xffffffff8046c0d5 at bxe_init_locked+0x125
#8 0xffffffff80470cfe at bxe_ioctl+0x4fe
#9 0xffffffff8099d08f at if_setlladdr+0x1ff
#10 0xffffffff8174c94a at lagg_port_setlladdr+0x8a
#11 0xffffffff8092cf55 at taskqueue_run_locked+0x85
#12 0xffffffff8092d0da at taskqueue_run+0x3a
#13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104
#14 0xffffffff808c0076 at ithread_loop+0xa6
#15 0xffffffff808bb9ef at fork_exit+0x11f
#16 0xffffffff80bc368e at fork_trampoline+0xe
Uptime: 39m41s
Dumping 1505 out of 32735 MB:..2%..11%..21%..31%..41%..52%..61%..71%..81%..91%
...cropped...

Okay, guess it has something to do again with the MTU 9000 but this time it does completely panic the kernel. This is no good.

Part 2) Trying bonding with normal MTU 1500

Code:

ifconfig bxe0 up -tso4 -txcsum -rxcsum mtu 1500
ifconfig bxe2 up -tso4 -txcsum -rxcsum mtu 1500
ifconfig lagg0 create
ifconfig lagg0 up laggproto failover laggport bxe0 laggport bxe2 10.50.50.11/24

This time, no error messages, no crash.

But no. Even when everything seems to be correct, the bonding is not working. We can't ping any host on the network. Also the lagg0 says: No carrier

See:

Code:

bxe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
        ether 00:10:18:98:35:f8
        inet6 fe80::210:18ff:fe98:35f8%bxe0 prefixlen 64 scopeid 0x3 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
        status: active
bxe2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
        ether 00:10:18:98:35:f8
        inet6 fe80::210:18ff:fe95:eaa0%bxe2 prefixlen 64 scopeid 0x5 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
        status: active
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
        ether 00:10:18:98:35:f8
        inet6 fe80::7a2b:cbff:fe1a:eab1%lagg0 prefixlen 64 scopeid 0x14 
        inet 10.50.50.11 netmask 0xffffff00 broadcast 10.50.50.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: no carrier
        laggproto failover lagghash l2,l3,l4
        laggport: bxe2 flags=0<>
        laggport: bxe0 flags=1<MASTER>

Please note that prior to installing ~~freebsd~~ FreeBSD, the machine was running a Debian 7 GNU/Linux 64 bit OS where we had the cards bonded and MTU'ed to 9000 without any crash or stability issue. So it looks to me that there is something really wrong with the Broadcom driver on ~~freebsd~~ 9.1, at least with the NICs used in Dell servers.

Provided that Broadcom themselves doesn't supply drivers for ~~freebsd~~ is there any possible fix?

Thanks for your attention and your help.

Cheers,
SÃ©bastien

BCM57711 on FreeBSD 9.1 RC1

zorlack

madman