Are AMD 3D V-Cache processors a good choice for FreeBSD?

robroy · Jun 4, 2024

FreeBSD Amigos,

Would you choose an AMD processor with 3D V-Cache for FreeBSD?

As you all know, AMD processors with the 3D V-Cache feature have an asymmetric core design. Half of the cores are tied to the 3D V-Cache itself, and thus sport lower cache latency; the other half bypass 3D V-Cache and sport higher clock speeds.

AMD has made software for Windows that eyeballs processes, threads or whatever and places them on the most suitable of the two core types. I've also seen comments suggesting that they (AMD) are making Linux kernel patches along the same lines. Meanwhile, enthusiasts trying for best performance are left to tinker with manual CPU core affinity (pinning) settings. All of this seems quite clunky to me.

And yet I've found nothing about how FreeBSD handles these processors. I'm guessing that the answer is "not at all," meaning that FreeBSD schedules processes on both core types without factoring in any knowledge of which type's best for what.

These mixed-core processors (like the Intel P/E cores) seem awfully complicated and I'd just avoid them for simplicity's sake, but the models with 3D V-Cache are actually much more power-efficient than those without.

I'm specifically comparing EPYC 4564P and EPYC 4584PX, but I gather that the comparison's the same between the older Ryzen 9 7950X and Ryzen 9 7950X3D.

Thank you very much.

cracauer@ · Jun 4, 2024

Looking at benchmarks (Phoronix et al) the higher clock speed seems to trump the bigger cache for most uses except gaming and some scientific computing. In speed, efficiency is a bit higher with the big cache.

I would still get the x3d part just to experiment and compare the 2 CCDs.

robroy · Jun 6, 2024

Thank you cracauer@.

Aren't CPUs with mixed cores (that perform differently) susceptible to hypervisor scheduling problems?

For example, what happens when a bhyve VM with two virtual CPUs is scheduled with one virtual CPU on a regular core, and the other on a 3D V-Cache core?

Wouldn't the two cores finish different amounts of work during a given time-slice? And couldn't that propagate back up to the guest operating system and make it go nuts?

I remember having heard about this kind of problem many years ago. But maybe this is just not an issue these days.

cracauer@ · Jun 6, 2024

robroy said:
Thank you cracauer@.

Aren't CPUs with mixed cores (that perform differently) susceptible to hypervisor scheduling problems?

For example, what happens when a bhyve VM with two virtual CPUs is scheduled with one virtual CPU on a regular core, and the other on a 3D V-Cache core?

Wouldn't the two cores finish different amounts of work during a given time-slice? And couldn't that propagate back up to the guest operating system and make it go nuts?

I remember having heard about this kind of problem many years ago. But maybe this is just not an issue these days.

Well, 2 cores will never finish the same work at exactly the same time anyway. There is variance in system calls, cache loads, different dynamic clockspeeds. It is not a problem.

As for general scheduler questions, I think that a 7950x3d doesn't present a particular challenge since the cores are the same speed for most applications. On Intel with their very slow E-cores the matter is quite different.

blackbird9 · Jun 6, 2024

Improved power efficiency <<should>> be a big win overall, other things being equal, assuming a reasonably large number of target machines. Of course it depends on the particular workload, but I think in the first instance I would simply test the x3d parts and make a comparison to non-x3d, with no changes to software. Test for performance and power consumption with typical workloads. Look for any significant performance degradation, versus the cost difference in power consumption, multiplied by the number of machines being deployed. Also need to factor in the price difference if any of the x3d cores. And this is a piece of work that can be done relatively quickly and cheaply.

I've written code for xeons in the past to set thread affinity to specific cores, and it can be very effective, however you may find you need to do some further tuning for each new core type / machine type that comes along. The cost of re-designing/coding existing software, and knock-on cost of testing the tuned code and risk of introducing bugs and customer impact, are all additional factors to consider. The risk is of incurring costs and time if you start tuning your code to specific cpus. So you write something that runs efficiently on this architecture. Then a competitor brings out a different new cpu and you are then faced with tuning for that...

So I think in the first instance I would simply treat it purely as a hardware change and do some testing to evaluate what, if any gains there are in power consumption with x3d, versus whether there is any overall performance degradation, and factor in the cost difference in the chips.

At the end of the day it's another take on the big-little concept, albeit less radical than the intel version with its P-cores and E-cores. I think I would be sceptical of claims of major power savings until I had actually tested the hardware and measured the power consumption in real-world use.

blackbird9 · Jun 6, 2024

A couple more thoughts, it's actually quite an interesting topic. The big-little concept originated with ARM, targeted at phones, where the cost of tuning software to a particular cpu architecture could be amortized over typical production runs of millions or tens of millions of units. Furthermore low power consumption and hence long battery life is a critical marketing feature, no-one wants a phone which goes flat in a few hours which is why intel never got anywhere in that space.

From phones big-little has more recently been moved into laptops for the same reason, ie extending battery life; we can observe that there has been some technology convergence between phones, tablets and laptops. And now finally big-little is being tried in the server space. Power consumption is the key driver.

However traditional server workloads are rather different from phone and typical laptop workloads. Servers are typically run at maximum cpu utilisation for as close to 100% of time as possible, to obtain the maximum ROI. You want to do the maximum amount of work per unit time. Whereas mobile devices typically spend most of their time either suspended or idle. So I'm less convinced about the applicability of the big-little concept to servers. Perhaps that's why AMD have adopted their somewhat less radical version of the idea (compared, say, to ARM). Granted, it may be that overall the average power consumption of the x3d chip is lower than non-x3d. But to be successful it has to achieve that without a corresponding reduction in work done. And, I think, because software development time has historically always the most expensive factor, without requiring a lot of complex software tuning to exploit the new processor architecture. Ideally it needs to be a transparent drop-in replacement. Or at most, let the o/s do the tuning.

robroy · Jun 7, 2024

Thanks for your great comments blackbird9.

blackbird9 said:
Improved power efficiency <<should>> be a big win overall, other things being equal, assuming a reasonably large number of target machines.

The number of target machines is one

. I'm just looking to add a new server to my home lab.

Lower power interests me mainly to limit fan noise (not to mention that I pay the power bill).

blackbird9 said:
Of course it depends on the particular workload, but I think in the first instance I would simply test the x3d parts and make a comparison to non-x3d, with no changes to software. Test for performance and power consumption with typical workloads.

I'd love to do this. But I'd only do it if I received samples from Supermicro to review.

They actually offered to sign me up with their review program after I made Running FreeBSD on a Supermicro 5017A-EF ten years ago. But I realized that though it'd satisfy my curiosities, making the reviews would also be a lot of (unpaid) work. So, you know...

blackbird9 said:
Also need to factor in the price difference if any of the x3d cores.

AMD has actually priced the EPYC 4584PX and 4564P parts identically.

blackbird9 said:
Servers are typically run at maximum cpu utilisation for as close to 100% of time as possible, to obtain the maximum ROI.

I guess you're right about this. In my weird case though, the host will be idle for most of its life. I'd only be using it for four hours a day. The rest of the time it'd just be humming away peacefully.

robroy · Jun 7, 2024

Thank you cracauer@.

cracauer@ said:
Well, 2 cores will never finish the same work at exactly the same time anyway. There is variance in system calls, cache loads, different dynamic clockspeeds. It is not a problem.

Okey dokey. Thanks for explaining that. I encountered that VM scheduling concern during a VMware training session back in 2006. So maybe it's an outdated concept (and/or I'm just nuts).

cracauer@ said:
As for general scheduler questions, I think that a 7950x3d doesn't present a particular challenge since the cores are the same speed for most applications. On Intel with their very slow E-cores the matter is quite different.

All righty; thanks for this.

blackbird9 · Jun 7, 2024

robroy said:
They actually offered to sign me up with their review program after I made Running FreeBSD on a Supermicro 5017A-EF ten years ago. But I realized that though it'd satisfy my curiosities, making the reviews would also be a lot of (unpaid) work. So, you know...

Well, I guess if you get to keep a brand new supermicro server out of the deal, that's not a bad return on making a review, they're pretty good machines. Of course if they want it sent back to them afterwards, that's not quite so attractive!

I expect they do want it back, of course ;-)

As for limiting fan noise.. I guess you could get the new part just for interest and to do the experiment, it would make the review more interesting, especially if you could compare the two chips.

Most servers I'm used to are pretty loud, I'm assuming this is at least a 1U or 2U rack mount box you're talking about, so I'd be surprised if putting the new chip in made a huge difference to the fan noise. But since the price is the same, you may as well get the new chip and give it a try!

Just for interest, a friend of mine has recently set up a little home server on one of the aliexpress N100 mini-pc's. it's got 16GB ram, 512 GB SSD, dual ethernet ports and the cpu has 4-off 3.4 GHz E-cores. It's essentially a low power, low cost NUC clone. He's running multiple docker instances on it with multiple web servers, all on this tiny box not much larger than my hand. He says it's completely silent, and only uses around 10W of power. Of course his sites are pretty low traffic! But if you want low fan noise and low electric bills at home, perhaps it's worth considering other kinds of hardware. It's quite surprising what you can do even with very small machines nowadays.

But since you've been talking about epyc's and stuff it's probably not going to be powerful enough for your application!

I just checked your old review btw, ... good review!

Are AMD 3D V-Cache processors a good choice for FreeBSD?

Which processor would you choose for FreeBSD?

3D V-Cache (e.g., EPYC 4584PX)

Regular (e.g., EPYC 4564P)

robroy

cracauer@

robroy

cracauer@

blackbird9

blackbird9

robroy

robroy

blackbird9