Force make to take into account cpuflags.

Can i edit make.conf to force make to include my cpuflags. Or is this done by default by clang/gcc.
cpuid2cpuflags returns,
Code:
CPU_FLAGS_X86: aes avx f16c mmx mmxext pclmul popcnt rdrand sse sse2 sse3 sse4_1 sse4_2 ssse3
 
I've got:
Code:
CPUTYPE?=core-avx-i
So that should be ok.
Just one questionmark. Why the questionmark in the line above ?
 
Yep, it's more or less what CPUTYPE=native achieves.

I strongly recommend against it. For most software, the "benefit" isn't even measurable. But it will make the binaries incompatible with any other CPU. And even bugs triggered by "optimized" builds aren't unheard of. OTOH, software that can really benefit from specific CPU features quite often already includes runtime detection and different code paths.

Just one questionmark. Why the questionmark in the line above ?
This is make's syntax for "only set the value if there's nothing set yet", so it enables to override it e.g. from the commandline.
 
My make.conf contains,
Code:
OPTIONS_SET+=SSE
OPTIONS_SET+=SSE2
OPTIONS_SET+=SSE3
OPTIONS_SET+=SSE4_1
OPTIONS_SET+=SSE4_2
OPTIONS_SET+=SSE41
OPTIONS_SET+=SSE42
OPTIONS_SET+=SSSE3
OPTIONS_SET+=AVX
OPTIONS_SET+=OPTIMIZED_CFLAGS
 
Yep, it's more or less what CPUTYPE=native achieves.

I strongly recommend against it. For most software, the "benefit" isn't even measurable.

So what happens if I don't do it? What does it by default build for? 386? 686? And the difference indeed wouldn't be measurable?
 
Native is a bad idea, but setting CPUTYPE will in many cases give you better performance however you may run into some build issues too....

Alain De Vos
No, that's a bad idea and will break things (sometimes silently)

Just a quick example that it does improve but not in all install cases...

Converting a ~1h.15min long wav file using audio/exhale (within margin of error)

Stretching (changing tempo) on a ~1h.15min long wav file 8% using audio/rubberband (fftw3 does the cpu intensive stuff)
with vs without
Code:
        4m34.63s real           4m27.83s user           5.15s sys
        6m14.65s real           6m8.85s user            3.53s sys
That's by CPUTYPE?=tigerlake on my laptop (it's VM but you get the idea)
 
issue is: not all software and it's build system supports certain optimizations, cflags, cxxflags, etc. forcing them could let to build failures. for example, gentoo has a way of filtering out such flags during buil time, if needed. i do not know if such framework is present in freebsd ports
 
Don't modify flags unless you know what you're doing, setting march= (CPUTYPE) should however just work but you may end up with rare edge cases....
 
So what happens if I don't do it? What does it by default build for? 386? 686? And the difference indeed wouldn't be measurable?
There is a default per architecture. IIRC, for the i386 architecture, this was recently changed from i486 to i686. For most software, the difference won't be measurable indeed. For some "special purpose" tasks OTOH, it can be a substantial difference. Some of this software already does runtime detection of CPU features and picks the best code path. You could of course do the tedious task to identify software not doing that, but still profiting, and enable "optimized" flags there. Whether that's really worthwhile, well, I would say: no.

Native is a bad idea, but setting CPUTYPE will in many cases give you better performance
All native does is to let the compiler automatically detect the CPU type. So, if you always build on the same machine you're also running the software, it's just the same as explicitly setting some value...
 
There is a default per architecture. IIRC, for the i386 architecture, this was recently changed from i486 to i686.
Thanks. That's what I imagined.
For most software, the difference won't be measurable indeed. For some "special purpose" tasks OTOH, it can be a substantial difference. Some of this software already does runtime detection of CPU features and picks the best code path.
I never bothered to measure. I just tried to figure the correct setting and put it in place. This is sometimes a bit more difficult, because the building system does not run what the target system runs and vice versa (e.g. Ivybridge/Avoton), and then a common ground needs to be found.

And once I created bug 260791, which shows that the CPU features are actually used. In this case the speed difference might depend on what is done within the ruby. (If anybody is in the mood to measure this out: You're very welcome.)
You could of course do the tedious task to identify software not doing that, but still profiting, and enable "optimized" flags there. Whether that's really worthwhile, well, I would say: no.
I don't think that's necessary. If there is need for a machine that "just works" (and can easily be swapped), the defaults should do. And if a specific workhorse gets tuned and optimized, then the CPU is known and the code can be built individually for the target.

All native does is to let the compiler automatically detect the CPU type. So, if you always build on the same machine you're also running the software, it's just the same as explicitly setting some value...
"native" does not work on my machines:

/etc/make.conf:
CPUTYPE?=native

Code:
# make
===>  Building for lsof-4.96.4,8
Constructing version.h
(cd lib; /usr/bin/make DEBUG="-O2" CFGF="-pipe -march=native -fstack-protector-strong -fno-strict-aliasing -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_DUP2 -DHAS_CLOSEFROM -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHASMSDOSFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPROCFS -DHASPSEUDOFS -DHASNULLFS -DHAS9660FS -DHAS_NO_ISO_DEV -DHASIPv6 -DHASUTMPX -DHAS_XTCPCB_TMAXSEG -DHAS_KF_SOCK_SENDQ -DHAS_STRFTIME -DLSOF_VSTR=\"13.1-RELEASE-p3\"")
cc   -pipe -march=native -fstack-protector-strong -fno-strict-aliasing -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_DUP2 -DHAS_CLOSEFROM -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHASMSDOSFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPROCFS -DHASPSEUDOFS -DHASNULLFS -DHAS9660FS -DHAS_NO_ISO_DEV -DHASIPv6 -DHASUTMPX -DHAS_XTCPCB_TMAXSEG -DHAS_KF_SOCK_SENDQ -DHAS_STRFTIME -DLSOF_VSTR="13.1-RELEASE-p3" -I/usr/src/sys -O2 -c ckkv.c -o ckkv.o
error: unknown target CPU 'athlon-xp'
note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, rocketlake, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4
*** [ckkv.o] Error code 1

make[2]: stopped in /usr/ports/sysutils/lsof/work/lsof-4.96.4/lib
1 error
 
"native" does not work on my machines:

/etc/make.conf:
CPUTYPE?=native

Code:
# make
===> Building for lsof-4.96.4,8
Constructing version.h
(cd lib; /usr/bin/make DEBUG="-O2" CFGF="-pipe -march=native -fstack-protector-strong -fno-strict-aliasing -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_DUP2 -DHAS_CLOSEFROM -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHASMSDOSFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPROCFS -DHASPSEUDOFS -DHASNULLFS -DHAS9660FS -DHAS_NO_ISO_DEV -DHASIPv6 -DHASUTMPX -DHAS_XTCPCB_TMAXSEG -DHAS_KF_SOCK_SENDQ -DHAS_STRFTIME -DLSOF_VSTR=\"13.1-RELEASE-p3\"")
cc -pipe -march=native -fstack-protector-strong -fno-strict-aliasing -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_DUP2 -DHAS_CLOSEFROM -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHASMSDOSFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPROCFS -DHASPSEUDOFS -DHASNULLFS -DHAS9660FS -DHAS_NO_ISO_DEV -DHASIPv6 -DHASUTMPX -DHAS_XTCPCB_TMAXSEG -DHAS_KF_SOCK_SENDQ -DHAS_STRFTIME -DLSOF_VSTR="13.1-RELEASE-p3" -I/usr/src/sys -O2 -c ckkv.c -o ckkv.o
error: unknown target CPU 'athlon-xp'
note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, rocketlake, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4
*** [ckkv.o] Error code 1

make[2]: stopped in /usr/ports/sysutils/lsof/work/lsof-4.96.4/lib
1 error

This looks like a weird bug. The "valid" CPU types listed are all amd64, while athlon-xp (what "native" here seems to detect) is i386.
 
This looks like a weird bug. The "valid" CPU types listed are all amd64, while athlon-xp (what "native" here seems to detect) is i386.
Exactly. It's a VIA Samuel with amd64 support and SSE3 :)
Or, more precisely, it is what QEMU thinks might be the best approach to create a vCPU that can be moved between AMD and Intel servers:

Code:
CPU: QEMU Virtual CPU version 2.5+ (2994.61-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x663  Family=0x6  Model=0x6  Stepping=3
 Features=0x783fbfd<FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0x80202001<SSE3,CX16,x2APIC,HV>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>

 
There is a default per architecture. IIRC, for the i386 architecture, this was recently changed from i486 to i686. For most software, the difference won't be measurable indeed. For some "special purpose" tasks OTOH, it can be a substantial difference. Some of this software already does runtime detection of CPU features and picks the best code path. You could of course do the tedious task to identify software not doing that, but still profiting, and enable "optimized" flags there. Whether that's really worthwhile, well, I would say: no.


All native does is to let the compiler automatically detect the CPU type. So, if you always build on the same machine you're also running the software, it's just the same as explicitly setting some value...
Except when it gets it wrong which has occurred numerous occasions using both GCC and LLVM/Clang
 
An associate problem.
What do you think about flags,
-fomit-frame-pointer
-funroll-loops
-other ???
What could possibly go wrong?

There's a page with great quotes from Gentoo users who found out. Unfortunately it's wrapped in a racist cliche, so I won't link it here.
 
Back
Top