Mellanox MT26448

Groak

New Member


Messages: 9

I hope this isn't a dumb question BUT... How do I configure a Mellanox MT26448 10GigE card on FreeBSD 11.1?

I have the card listed in the pciconf listing:

none2@pci0:7:0:0: class=0x020000 card=0x001515b3 chip=0x675015b3 rev=0xb0 hdr=0x00
vendor = 'Mellanox Technologies'
device = 'MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
class = network
subclass = ethernet

You can see that no driver has recognised it.

I thought that the mlxen drivers were built into 11.1? I am running:
FreeBSD medusa 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 11:55:48 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

The card is not listed in the ifconfig list (as expected since it has no driver connected).

I have done a lot of googling and cannot seem to work it out.
Any help would really be appreciated.
Thanks.
 
OP
OP
G

Groak

New Member


Messages: 9

I think a partial answer is that I need to compile and load the mlx4 and mlx4en modules. They are in the 11.1 source tree.

I have compiled the modules and copied to /boot/kernel.

When I kldload either module I get the errors:

Jan 27 16:48:30 medusa kernel: KLD mlx4.ko: depends on kernel - not available or version mismatch
Jan 27 16:48:30 medusa kernel: linker_load_file: Unsupported file type
Jan 27 16:49:24 medusa kernel: KLD mlxen.ko: depends on kernel - not available or version mismatch
Jan 27 16:49:24 medusa kernel: linker_load_file: Unsupported file type


How do I update the source tree to 11.1-RELEASE-p1? Is that the problem?
 

Phishfry

Son of Beastie

Reaction score: 1,530
Messages: 4,444

What you need is a custom kernel with the modules compiled in.
https://www.freebsd.org/doc/handbook/kernelconfig-building.html

So backup the default kernconf 'GENERIC' and add lines in your new kernconf like this:
Code:
device          mlx4ib          # Mellanox ConnectX HCA InfiniBand
device          mlxen           # Mellanox ConnectX HCA Ethernet
Follow the instructions from there. I am unsure if the device names I used above are correct.

Here is a site where they show details.
http://ronny-mueller.com/2017/02/10/howto-install-hp-enterprise-mellanox-connectx-2-en-nic-671798-001-in-freebsd/
I prefer to build the modules into the kernel instead of dynamically loading them.

How do I update the source tree to 11.1-RELEASE-p1? Is that the problem?
This could be your problem if you are using an older version.
The p1 is not important unless the patch touched this driver. (doubtful)
Source code is automatically patched by freebsd-update when applying security patches.
 
OP
OP
G

Groak

New Member


Messages: 9

Thanks Phishfry. I am trying your suggestion now.

Building with mlxen or mlxca generates linking errors such as this:
Code:
mthca_av.o: In function `tavor_rate_to_ib':
/usr/src/sys/ofed/drivers/infiniband/hw/mthca/mthca_av.c:90: undefined reference to `mult_to_ib_rate'
I had tried to compile the kernel following the instructions in man page for mlx4en. It says to include:

Code:
options COMPAT_LINUXKPI
device mlx4
device mlx4en
Adding the above three lines results in an immediate "not found" error for both device drivers when you start the make.

In Aug2014 the mlx4 and mlx4en driver apparently did work according to this thread: 47685" href="/index.php?threads/47685/">Thread 47685

Any suggestions on the additional steps to build the mlx4 driver as per the man page for mlx4en(4)?
 

puretone

Member

Reaction score: 31
Messages: 81

Thanks Phishfry. I am trying your suggestion now.

Building with mlxen or mlxca generates linking errors such as this:
Code:
mthca_av.o: In function `tavor_rate_to_ib':
/usr/src/sys/ofed/drivers/infiniband/hw/mthca/mthca_av.c:90: undefined reference to `mult_to_ib_rate'
I had tried to compile the kernel following the instructions in man page for mlx4en. It says to include:

Code:
options COMPAT_LINUXKPI
device mlx4
device mlx4en
Adding the above three lines results in an immediate "not found" error for both device drivers when you start the make.

In Aug2014 the mlx4 and mlx4en driver apparently did work according to this thread: 47685" href="/index.php?threads/47685/">Thread 47685

Any suggestions on the additional steps to build the mlx4 driver as per the man page for mlx4en(4)?

I have four Mellanox MT26448 cards installed on various FreeBSD boxes in my home network. I can confirm they work perfectly in FreeBSD 11.2 12.0 & 13.0
The mlx4 / mlx4en driver has *NOT* been axed. I am using it at this very moment.
The only thing I had to do was add the device(s) & options to KERNCONF as spelled out by the mlx4en() man page. Followed by a kernel recompile; and finally the usual required ifconfig adjustments to the /etc/rc.conf configuration.
I suspect that your sources are probably missing or incomplete... Probably a good idea to either fetch src.txz or svnlite checkout base from the svn.freebsd.org repo && buildworld && installworld.
 

puretone

Member

Reaction score: 31
Messages: 81

`Orum

Active Member

Reaction score: 24
Messages: 199

I'm in a similar situation, but I've already compiled it in and it's still not working:
Code:
# uname -a
FreeBSD opal 12.0-RELEASE-p10 FreeBSD 12.0-RELEASE-p10 OPAL  amd64

# dmesg|grep mlx
mlx4_core0: <mlx4_core> mem 0xfbe00000-0xfbefffff,0xfb000000-0xfb7fffff irq 50 at device 0.0 numa-domain 1 on pci10
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: Unable to determine PCI device chain minimum BW

# pciconf -lcbv mlx4_core0@pci0:129:0:0:
mlx4_core0@pci0:129:0:0:        class=0x020000 card=0x002115b3 chip=0x675015b3 rev=0xb0 hdr=0x00
    vendor     = 'Mellanox Technologies'
    device     = 'MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 64, base 0xfbe00000, size 1048576, enabled
    bar   [18] = type Prefetchable Memory, range 64, base 0xfb000000, size 8388608, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 03[48] = VPD
    cap 11[9c] = MSI-X supports 128 messages, enabled
                 Table in map 0x10[0x7c000], PBA in map 0x10[0x7d000]
    cap 10[60] = PCI-Express 2 endpoint max data 256(256) FLR
                 link x8(x8) speed 5.0(5.0) ASPM disabled(L0s)
    ecap 000e[100] = ARI 1
    ecap 0003[148] = Serial 1 0002c90300529b42
Nothing for it shows up in ifconfig. I found on another forum someone who had a similar problem, but he was able to fix it with # sysctl sys.device.mlx4_core0.mlx4_port1=eth (as his was in infiniband mode). When I tried to do the same, I noticed it was already set to 'eth'. So is there any way to get this working?
 

Phishfry

Son of Beastie

Reaction score: 1,530
Messages: 4,444

Silly suggestion but I see NUMA1 in your post and that means dual CPU rig.
Have you tried switching to NUMA0 based slot? Some drivers have NUMA issues.
 

`Orum

Active Member

Reaction score: 24
Messages: 199

I swapped it over to a PCIe slot tied to the other CPU, and it now shows as numa-domain 0. No luck, however, as the card still doesn't appear in ifconfig.

Edit: I've moved the card to another machine with a single CPU just as a final test, but the problem remains. I've submitted PR 240576

Edit2: I figured I'd update the firmware in case that was the issue. It's now running 2.10.0720 (up from 2.9.1000), and it generated some interesting new dmesgs:
Code:
mlx4_core0: Old device ETS support detected
mlx4_core0: Consider upgrading device FW.
These are in addition to the other messages I posted earlier. However, a newer firmware is now asking me to update, whereas the older one didn't complain?! And yes, the card still doesn't appear in ifconfig. In any case, I'd love to know what puretone's mstflint -d <device> query says for his working cards.
 
Last edited:

`Orum

Active Member

Reaction score: 24
Messages: 199

After more time spent working on this than I care to admit to, I finally found the cause of the problem. In short, you need to keep "options INET6" in your kernel or your interface won't appear! This is rather interesting as other interfaces don't display this behavior, even when they have similar IPv6-only options. I've also added the information to the PR and hopefully this will be documented in the man page in future releases.
 
Top