Is it worth building for native CPU on amd64?

Hi. CLANG 10.0.1 from FreeBSD 12.2-RELEASE-p11 supports the following architectures, which I presume are all x86 based:

Code:
note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell,
      skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2,
      bdver1, bdver2, bdver3, bdver4, znver1, znver2, x86-64

Of the couple of servers I have that are updated from source, I build world/kernel on each machine individually. Is it worth trying to build (on 64 bit x86) for the exact CPU the machine has? And is a CLANG architecture such as (for example) "ivybridge" actually i386, and therefore not relevant for a 64 bit system?

Tried hunting the big G for answers, but no luck. Since I have to build from source anyway, I'm curious whether I can improve efficiency by not having to support the oldest CPU. Thanks.
 
Hi. CLANG 10.0.1 from FreeBSD 12.2-RELEASE-p11 supports the following architectures, which I presume are all x86 based:

Code:
note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell,
      skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2,
      bdver1, bdver2, bdver3, bdver4, znver1, znver2, x86-64

Of the couple of servers I have that are updated from source, I build world/kernel on each machine individually. Is it worth trying to build (on 64 bit x86) for the exact CPU the machine has? And is a CLANG architecture such as (for example) "ivybridge" actually i386, and therefore not relevant for a 64 bit system?

Tried hunting the big G for answers, but no luck. Since I have to build from source anyway, I'm curious whether I can improve efficiency by not having to support the oldest CPU. Thanks.
Once I had a machine with haswell CPU and tried to build exclusively for it. The code was different, but then I ran some tests and everything was actually a bit slower. So I did abandon the haswell flag. It may be different with other architectures.
 
The easiest thing to do -- assuming you're building on the host that will be running the compiled versions -- (in /etc/make.conf is to use CPUTYPE ?= native). That way you don't have to change it for every system.
 
That's a pretty complex topic. At least on x86, more modern CPUs come with specific instruction sets (check out https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures for reference), which - if used - may prove to be beneficial for performance.

Said performance may however only be measurable under particular circumstances, i.e. in multi threaded applications. Work loads, that run in only one thread (which there are some in FreeBSD's kernel, if memory serves), may actually be hit in a detrimental way. Sometimes, improvements may turn out to be causing security concerns.

There are certainly some general optimizations, which deliver measurable results. Just look at this for example:

Overall, optimizations are kind of like a heuristic. There may be special cases, where optimizing can lead to unexpected or undesired behavior - for example

In terms of your kernel compilation and choosing CPU specific code, you might want to simply try it and check whether it helps with your particular workload. Unfortunately, with your particular system, you're probably the only person to definitively confirm whether it's worthwhile for your box to compile it with CPU-specific code.
 
The easiest thing to do -- assuming you're building on the host that will be running the compiled versions -- (in /etc/make.conf is to use CPUTYPE ?= native). That way you don't have to change it for every system.
Thanks, that's a nice tip.

After some further searching, I think I'm teetering at the edge of the rabbit hole: not only is there -march, but there's also -mtune and -mcpu (the latter for legacy GCC only?), and as well as CPUTYPE in /etc/make.conf there's also MACHINE_CPU.


FYI, package sysutils/hs-cputype will show the current CPU type.

I did some quick tests on a random file from the secp256k1 lib, and the following two commands result in a byte-for-byte identical object file (with CLANG 10.0.1, on an i7-3930K) :

cc -march=sandybridge -mtune=sandybridge ...

cc -march=native ...

So the latter generic "native" seems to work as expected. The output is also slightly smaller when compiling for the specific CPU, versus no -march/-mtune flags. (Note, I haven't benchmarked actual performance.)

There are some userland applications I use that would benefit from further experiments to wring out the very last bit of optimisation from, but for the base system build I think I'll stick with CPUTYPE ?= native.

BTW, I'm not necessarily interested in just "faster": one of my low power embedded CPUs runs very hot for some reason, with one core reporting around 68 degrees at a load of only about 0.1 to 0.2. An identical device with a slightly slower CPU reports 52C at a load of 0.4.
 
Tried changing the clock speed (powerd or powerdxx, or just set it to a low value with sysctl.conf) if it’s CPU tempersture (and not performance) you’re trying to impact?

They all adjust dev.cpu.0.freq.
 
Tried changing the clock speed (powerd or powerdxx, or just set it to a low value with sysctl.conf) if it’s CPU tempersture (and not performance) you’re trying to impact?

They all adjust dev.cpu.0.freq.
powerd is running, and does downclock the frequency. I've also tried forcing it lower via sysctl. Neither seems to make a lot of difference.

It's possible that the reported temperature is bogus, since there's a large difference between the two cores (and a fair difference in temp today versus yesterday) :

dev.cpu.1.temperature: 53.0C
dev.cpu.0.temperature: 39.0C

...or perhaps the temperatures are accurate, and it's something like a bad thermal connection. I can feel a lot of heat coming from it. For comparison, the other device currently shows 55C and 56C for each respective core.
 
Back
Top