What is/are the recommended tunable settings for an ix-based 10Gbit Base-T nic on a busy, high-usage network card? Are there any adjustments to these settings when there are 2 ports (ix0, ix1) when one or both are in use?
For starters,
net.isr.maxthreads="-1"
right?
CPU0 and CPU1? What is this, 2005? ix nics will use multiple cpus so it's not as simple as this. You can't even answer this question without knowing more about what the OP is doing. Is this a single nic system/server, or a router/bridge? How many cpus? What Ghz? What load? Is the NIC on board?One thing I don't see mentioned much is NUMA performance.
You actually get lesser performance with multiple CPU's due to NUMA spanning different CPUS's.
So if using multiple CPU's it helps to know what PCIe slot is connected to which CPU.
CPU0 is generally more responsive but depending on the architecture of the board, most of the PCIe slots might be assigned to only one CPU. So consider PCIe bandwidth used across each CPU in conjunction with Network Card placement.
For example if you find CPU0 has most of your peripherals hanging off it, try using CPU1 instead to balance the IO load.
There is no general guide for this, You have to benchmark and move your network adapter around to see the effects.
If you have Intel 10G Network adapters built on the motherboard there is not much you can do about it.
What hardware are you using? What other high bandwidth accessories are installed?
All PCIe slots must use a PCIe controller. Where are they located? They are now located on the CPU package itself.CPU0 and CPU1? What is this, 2005? ix nics will use multiple cpus so it's not as simple as this.
.Writing from a device in a PCIe slot to memory on a remote NUMA node through QPI incurs latency in several ways:
So instead of helping this user in another thread I have to defend my words. Here it goes.
All PCIe slots must use a PCIe controller. Where are they located? They are now located on the CPU package itself.
So if your NIC uses a PCIe lane every Transaction Layer Packet must first go through a CPU's PCIe Bus and into L3 cache.
Some motherboards use different physical CPU's for PCIe Lanes.
On my SuperMicro X9 and X10 LGA2011 boards the PCIe slot lanes are split up among the the two CPU's.
For example PCIe slots 1,2,4 and 5 are assigned to CPU1 PCIe root port. PCIe slots 3,6, and 7 are assigned to CPU2
Here is the relevant BIOS screen:
FAQ Entry | Online Support | Support - Super Micro Computer, Inc.
Frequently Asked Questionswww.supermicro.com
But on my Tyan LGA2011 board all PCIe slots are assigned to CPU1 and the motherboard peripherals are assigned to CPU2.
So ask yourself, how exactly does your Intel 10Gge Network Interface use both CPU's? It is called NUMA.
NUMA adds to the TLP path route so is not as fast as a single socket implementation.
Multiple CPU cores on a single die do not use NUMA but their own internal ring.
So a single CPU board will offer faster throughput than a multiple socketed board.
On die hardware packet routing is superior to software routing(NUMA) even with QPI enabled.
Documentation Portal
docs.napatech.com
.
Netflix is working to improve our NUMA performance.
Virtualization would be the number one reason for me.I'm just wondering why anyone would buy a MB with multiple physical CPUs.
FreeBSD 4 still had a single (giant) kernel lock. From 5 onward kernel locking became more fine-grained, which improved SMP performance significantly.I remember the jokers with
freebsd 4.x running dual socket MBs and they were slower than 1 cpu and they had no clue.
FreeBSD 4 still had a single (giant) kernel lock. From 5 onward kernel locking became more fine-grained, which improved SMP performance significantly.
Giant lock - Wikipedia
en.wikipedia.org
There are programming languages designed explicitly to exploit multiple core effectively, Erlang being one of the most common. I've been using multicore setups since late 90s, for a simple reason - you could get two of last year's CPUs for lower *cost* than this year's fastest one, and the performance (at the time Windows NT, but now obviously FreeBSD) was noticeably better. Maybe not 2x better, but certainly 1.5x and for less cost too!
Turning off hyperthreading seems incredibly dumb; the fewer cores you have per virtual machine the more useful hyperthreading is (its most useful with 1 physical core). The biggest obstacle in multicore utilization is cpu contention under load, and the more cpus available the better, even if they're hyperthreads.Virtualization would be the number one reason for me.
If you turn off hyperthreading you are left with a small number of cores.
My 2650LV3 chips support 12 cores. With a second socket I can increase that to 24 cores.
With that I can manage 5 or 6 VM's comfortably.
I could also see a use case in software development. Compilers can really take advantage cores and that speeds development work.
Not if they are internet facing VM's.Turning off hyperthreading seems incredibly dumb.
This depends on what you're doing. Used 12 core cpus are $225 on ebay if you want to cheap out. There's no way 2 6 core cpus are faster than 1 12 core of the same ghz. Plus dual socket MB are more expensive; the memory is more expensive. If you have 10gb/s to move saving a couple $100 is foolhardy.
The biggest obstacle in multicore utilization is cpu contention under load, and the more cpus available the better, even if they're hyperthreads.
Using this as a guide is like getting restaurant advice from a 5yo. Seriously. Dumbing down your entire system by disabling hyperthreading because the default settings on many ethernet drivers are bone-headed makes no sense. Tune the card. Without card tuning (#queues, interrupt moderation, etc) the benches are useless. They're testing the default config of the drivers which are largely written by some guy who couldn't wait to be finished writing the driver. i remember sparring with Jack Vogel who did the intel drivers back in the day; a good guy but a terrible programmer. I re-wrote the igb driver in 2010 and it was way more efficient than the stock driver.I use this for my tuning guide:
I dont need 24 cores. I'd rather buy 2 systems than put 24 cores into 1Umm yeah you're doing the math wrong here. That's not what I said.
Last year's 12 core cpu is almost 1/2 the price of this year's 12 core cpu -> you can get 24 cores for a similar price if you use last year's kit. The big difference I notice these days is that power consumption is dropping. The last time I upgraded equipment, the power costs over a couple of years was what made it *cheaper* to update. I haven't taken this into account and maybe it makes all the difference?
wrt 2nd hand gear, yes there is always a bargain to be had. I saw 2 full racks of prime intel hardware going for a song recently from a bank foreclosure. If I could have figured out a way to get them into my cellar along with the 3-phase power it required, well... home heating in winter would never have been a problem again![]()
Misogyny much? Tread carefully, you're on very, very thin ice with remarks like that.been replaced in freeBSD 12 by something written by some chick from the now defunct nextBSD.
Was this done because the driver is better, or because it combined em, lem and igb into 1 driver? Did anyone bother to bench it? will our systems be 10% slower because someone decided the new driver is more elegant?
Misogyny much? Tread carefully, you're on very, very thin ice with remarks like that.
⚙ D8299 Convert igb(4), em(4) and lem(4) to iflib
reviews.freebsd.org
⚙ D12235 iflib rollup patch.
reviews.freebsd.org
Fake news on the FreeBSD forum? From the UK no less!Not if they are internet facing VM's.
Theo was right on this one. Hyperthreading is a giant security threat. Just disable it and you are so much safer.
View: https://www.youtube.com/watch?v=jI3YE3Jlgw8
Running on Intel? If you want security, disable hyper-threading, says Linux kernel maintainer
Speculative execution bugs will be with us for a very long timewww.theregister.co.uk