Solved SR-IOV Issues/Recommendations

I've been wrestling with a Mellanox ConnectX-3 trying to make SR-IOV work on FreeBSD. I've completed the (unnecessarily difficult) process to create a custom firmware configuration with SR-IOV enabled, burning the configuration/firmware to the card, etc. However, when the card is reinstalled in a FreeBSD host, the /dev/iov directory remains unpopulated.

I've also hit a roadblock with an Intel X520-DA2 where upon enabling SR-IOV via iovctl, the card stops passing all network traffic until the host is rebooted.

I've resigned myself to the fact that purchasing a different 10Gbe adapter will be necessary, but I am having difficulty deciding between two. I've basically ruled out the Intel X710 due to higher cost, negative comments on the forums, etc. The cards I am interested in are the Mellanox ConnectX-4 Lx and the Chelsio T520-CR.

I would appreciate any insight as to which adapter would offer a better overall experience, or any tips on getting SR-IOV working on my existing adapters.
 
Chelsio all the way.
In fact I will buy the card back from you if you can't get it working.
I am that confident. (don't go paying $1K and expect me to honor that)
Low Profile or Full Height bracket?

I just ugraded to used 25GB cards and recommend it if you can swing it. T580.
Cisco Nexus 25GB Switch for $125

I can't help you with other cards because they are broke. HPS worked at Mellanox and passed away.
So that is not good for thier driver. Intel IOV never worked for me on X710 or X550.
 
Thank you for the confirmation, those are the exact cards I was looking at (the Dell variant).

I will be purchasing a full height adapter. As for 25Gbe, that's a good thought. I don't have a 25Gbe-capable switch but "future-proofing" the FreeBSD host to the best of my ability isn't a bad idea.

I've seen rumblings that the Chelsio drivers can problematic, but I am assuming this doesn't apply to FreeBSD given the amount of times I've seen those adapters referenced. I plan to use the information outlined in cxgbe(4) to get the card working. I've also come across some tuning information that I'll likely implement once the card is installed.
 
There is no need for tuning off the bat.

You just need to remember that the interface name is different inside a VM.
Instead of cxl of host it is vcxl. You will need to add that to your VM's rc.conf.

You need to make a control file on host for each interface you want to use IOV on.
Manual says /etc/iov/ directory but it can be in /etc/ I have found. You must point to it in rc.conf anyway.

Code:
       iovctl_files
           (str) A space-separated list    of configuration files used by
           iovctl(8).  The default value is an empty string.
 
There is a nice glossy writeup here for the Intel X710.

Personally I have not been sucessful.

But I will say the article mentions enabling SR-IOV in BIOS and I think that is a good point. You need to use a server board.
Also the chelsio needs airflow so you need to think about a fan unless in server chassis.
 
I've read that article, which is why I started down the X710 route in the first place. I think I will just stick with the Chelsio recommendation to be safe, I don't want another headache.

The adapter will be installed in a 3U rack-mounted chassis with 2x 120MM Noctua fans pushing air across the add-in cards. I am planning to add 2x additional 60MM fans at the back of the case for added exhaust performance, but I'm thinking the existing fans should be sufficient for now.

Thank you for the resources and your input.
 
I've also tested with SR-IOV when I added the feature to pass vnet interfaces to jails to sysutils/iocell (not present in the port, this and other pull requests are pending since half a year - I'm preparing a new port/fork).

With Intel X5xx I could never get SR-IOV to work - as soon as a VF is passed to a jail, the whole NIC goes dark and needs a complete reset (i.e. reboot) to come back to life. This was regardless of any BIOS-settings an tested on at least 2 different Supermicro boards and one Atom C3758 appliance. If not enabled in BIOS, you can't even create VFs, so that part seems to work, but I suspect there's something broken in the firmware and/or how the FreeBSD implementation interacts with it when passing the VF to a jail (or VM).

With Mellanox CX3 the VFs were sucessfully passed to the jail, I could attach an address and send/receive traffic. I never fiddled around with firmware on those cards - AFAIK that's only a linux thing where you need to change the mode (ethernet/infiniband) in the firmware because their driver still isn't capable of doing that. On FreeBSD you just load the appropriate driver (mlx4en(4) or mlx4ib(4)) and it 'just works'™
Apart from some testing during the implementation in iocell and setting up an experimental gateway with VFs instead of epairs, I never ran SR-IOV/vnet jails for longer than a few days/weeks. For the sake of easy migration/recovery I keep jail configuration consistent across all hosts and hardware-agnostic - i.e. everything is connected to bridges which are identically named across hosts (e.g. br-dmz, br-wan, br-mgmt, etc...).

Although I do have 6 hosts at hand with CX3 (40G) NICs, all of them are production hosts so I can't really test with stuff that might interrupt anything. However, I also have a Cx3 in my server at home with which I could perform some more testing and another single-port CX3 for my desktop machine, but I have yet to find the full-size pci bracket which I have "put somewhere where it won't get lost" to be able to install it...

Regarding the 25G Chelsios: Phishfry could you point me in the right direction as to where they (T6225-CR?) can be found for less than ~200-300EUR (+ shipping and import into the EU, because they don't seem to be common at all here...)? There don't seem to be any 'white-box' variants of chelsios beyond the T520...
Given that one can easily find CX4s (which can be easily cross-flashed between the 10G and 25G variant) for 25-30EUR and sometimes even cheaper, the chelsio doesn't look very "bargain-y" at 10x that price point.
Especially if one counts in the fact that 25G capable ("proper") switches are either still relatively expensive or very power hungry. I've had an eye on the Nexus N9K-C92160YC-X for a while, as this seems to be the cheapest option to go beyond 10G that doesn't consume several hundred watts - but they are still ~800-1000eur (with fans/PSUs) here in germany - so what 25G capable nexus can be found for only 125$??[/port][/port]



EDIT:
I just wanted to re-check on my home server and found that /dev/iov now only contains the ix entries of the X540 interfaces on the riser. The mlxen interfaces are gone... same goes for the hosts with the 40G CX3 I just checked. So SR-IOV *does* seem to be broken now for those cards?
The commit for the vnet.interface feature in iocell is from july last year - so all my testing was then on 13.3-RELEASE, that regression must (might) have been introduced with 13.4-RELEASE or some patch or driver update since ~7/2024...
 
...

EDIT:
I just wanted to re-check on my home server and found that /dev/iov now only contains the ix entries of the X540 interfaces on the riser. The mlxen interfaces are gone... same goes for the hosts with the 40G CX3 I just checked. So SR-IOV *does* seem to be broken now for those cards?
The commit for the vnet.interface feature in iocell is from july last year - so all my testing was then on 13.3-RELEASE, that regression must (might) have been introduced with 13.4-RELEASE or some patch or driver update since ~7/2024...

Thank you for confirming that I haven't missed something in my struggles with the CX3. I'm still curious about the CX4's, as the overall cost is lower (and I've read they consume less power) but I think I will still move forward with purchasing a Chelsio, per Phishfry's suggestion.
 
Thank you for confirming that I haven't missed something in my struggles with the CX3. I'm
I have one of the CX3 cards and it did not do IOV either.
Mellanox MCX311A-XCAT CX311A
x4 PCIe single port 10GB fiber low profile
But that card was fail soup as it did not work on Arm64/RockPro64 either. Firmware did not like Arm64. Not a PCIe problem.

There don't seem to be any 'white-box' variants of chelsios beyond the T520...
I think ebay is loaded with datacenter switchouts/recycling.
You see a huge load of Chelsio with about 5 vendors all driving each others price down. 25 bucks for 40GB card.. Same as 10GB.

I really am an idiot as I upgraded to 40GB fiber and I think you were kind enough to help me on the STH forum setting up the Nexus.
So I got T580 x a bunch but all full height bracket. So I had to get creative and make Low Profile brackets.

I have no T6 Chelsio yet as they are expensive. Hopefully some datacenter changeouts are due.

The only reason I as pushed 25GB gear is I think that spending on 10GB at this point is borderline.
SFP+ has been usurped by QSFP so that is a good expense. Switching over to 40GB cost me more in cables and splitters than the gear. But breaking out 4x10GB from a 40GB QSFP port is pretty wicked feature to connect old Cisco gear with 4x10GB feeds.

Chelsio also seems to be forgiving on the SFP front as is Mellanox. Intel has its quirks.

Old datacenter gear can really build a fast network cheap. So 40GB is a stepping stone to 100GB with QSFP..
 
You see a huge load of Chelsio with about 5 vendors all driving each others price down. 25 bucks for 40GB card.. Same as 10GB.

Old datacenter gear can really build a fast network cheap. So 40GB is a stepping stone to 100GB with QSFP..
True, 40G gear is dirt-cheap, but the problem with most of that gear is, as 40G has been dead for a while, most of it is rather old and hence very power-hungry - that's in fact the only reason why its so dirt-cheap as everyone wants to get rid of those power hogs...

I'm currently looking for viable options as I'm soon moving to a house where I'll have a dedicated room for rack/servers in the basement and fiber runs (and cat7 for access points and some 'traditional 1G' gear) to several rooms, but it seems it still takes a year or two for 25G gear to become really interesting for an upgrade from 10G. Buying 40G gear (which pretty much is a dead-end) really isn't worth it nowadays if 50W of power draw equal almost 200EUR in electricity per year...

IMHO the current best bang-for-the-buck 10/25G+ switch is the nexus N9K-C92160YC-X with 48x 10/25G and 6x 40/50/100G - but for a home network it still has a pretty hefty price tag (in germany ~800-1000EUR with fans and PSU; I've seen it for ~500$ at ebay US, but shipping and import fees would also bring that to 800+).

For 10G the N3K-C3548P-P-10GX is a quite interesting option: 102W typical power draw, relatively moderate noise levels and can be found for under 400EUR every now and then. The non-X variant is already down to way under 300EUR, but has 40W higher power draw. If you only need 10G links this is a great switch - especially as the N3Ks can of course also connect to fabric extenders which are an extremely cheap way to add another switch to the network without the full configuration overhead. (of course only N5K upwards can control FEX, I have also been looking at various N5K-55xx variants lately, hence I mixed those up...)

I think I'll stay with 10G for now (which is still plenty enough for my use cases anyway...), but when buying NICs one can already go for 10/25G variants like the Connect-X4, as they really are already cheaply available (ecxept chelsio, sadly...). Switches will also get there soon - the first few generations have reached their EOL, so they are becoming more widely available on the grey market and prices are already plummeting.

To get back to the original topic: I just ordered me some Chelsio T520 cards. I'll hope to find some time when they arrive to put them into one of my systems and give SR-IOV a try again.
 
Phishfry do you have any strong feelings one way or another on the Chelsio T6225? I am seeing these adapters for ~$30 USD and thought they might work well for my use case.

According to this STH forum post, the T6225 requires 200 LFM. If I do the math on one of the Noctua fans I am running, the LFM comes out to ~408, so the adapter should be adequately cooled. That same post indicates that the T6225 uses the same full-height PCI bracket as the T520, so securing the adapter in my 3U rack-mount case shouldn't be an issue either.

Your input is appreciated.

Noctua NF-S12A PWM CFM: 107.5 m³/h = 63.27 CFM
120mm = 0.393701ft
120mm² = 0.15499969ft²
LFM = 63.27/0.15499969 = 408.1943647758263
 
According to the product brief [1] that NIC has a typical power draw of 13W - that's pretty much in the same ballpark as ConnectX4/5 (12.5W) or Intel XXV710 (11W) and not really that much. Of course you shouldn't put that NIC in a passively cooled system, but anything that provides some amount of airflow over the NIC should be fine... Those pesky quad 10Gbase-T NICs that were often bundled on riser cards had ~30W of power draw and got scorching hot yet still worked fine.

I haven't looked up what LFM is (an arbitrary grain kernel times some random bodypart squared?) or converted it into real units to get an idea of what amount they specify, but again: it's only slighly above 10W, has a heat sink and those chips usually have Tjmax temps of way over 100°, so I really see no problem here as long as you have some airflow.


[1] https://www.chelsio.com/wp-content/uploads/resources/T6225-CR-PB.pdf
 
Heating is not that bad. I have some I mounted 40mmx15mm fan onto heatsink.
I made some thick solid bronze ones trying to go fanless.
I have mounted fan at back end of card and built cardboard wind tunnel.
You need some cooling but I think pushing the heat out the back of the chassis is the desired effect.
Supermicro does good job of that with their chassis designs.
 
Card Backplates can be a problem. There are not alot for sale.

So for instance this card comes out of a NetApp server but needs modding the bracket to work in regular chassis.

 
Card Backplates can be problem. There are not alot for sale.

So for instance this card comes out of a NetApp server but needs modding the bracket to work in regular chassis.

The STH forum post indicates that the T6225 uses the same bracket as the T520:

The T6225 Low Profile Cards use the SAME PCIe brackets as the T520 and T420 cards.

If that is indeed the case, I would think that something like this would work, would you agree?
 
There is a bunch of stuff you can turn off to make them boot faster.
iSCSI booting, Blah blah blah.
I think it is CTL<S> at bootup. Multiple cards are handled though one screen.

Firmware will be flashed to lastest by FreeBSD driver. You will see the output in dmesg on first booting it.
It is a widget you can turn off if desired. Different FreeBSD will have different versions of firmware. Older cards may not apply.
hw.cxgbe.fw_install=

So don't use -CURRENT on it unless you plan on running it. Drivers are linked with firmare.
If you put a -CURRENT flashed card in a -14.2-RELEASE box it will not be as happy. So keep them linked if possible.
Keep driver with same version of firmware.
It will not automatically downgrade is what I am saying.
Once -CURRENT upgrades the card on bootup it it stays at that firmware level.
I like to keep them matched. You can downgrade by hand.
 
You can do a firewall with rules like pf on it. It has kernel TLS drivers.
There are just so many features you will not need them all but should be aware.

Driver and tools.
/usr/src/sys/dev/cxgbe/

cxgbtool ships in base. Flasher and much more
/usr/src/tools/tools/cxgbtool/
 
To close the loop on this, I ordered and installed a Chelsio T6225-CR, along with a full-height bracket for the T520-CR. I can confirm that the T520 bracket fit the T6225-CR perfectly.

I was finally able to get SR-IOV working, though it wasn't immediately obvious that I needed to add the following to my /etc/rc.conf:

Code:
if_cxgbev_load="YES"

The above information isn't specified in the cc(4) manual. Additionally, the cc(4) manual indicates the virtual interfaces will be named vcc, but when the interfaces are created, they are in fact named ccv. After some searching, I was able to locate the correct information in the ccv(4).

Once the card was installed, I checked the temperature and found it was reporting 81C, so obviously my 120mm case fans weren't cooling it adequately. I affixed 2x 40mm to the card's heatsink and that dropped the temperature to 48C.

I'm still monitoring and experimenting with the card, but I'm pleased thus far. Thank you Phishfry and sko for the assistance.
 
Back
Top