General FreeBSD desktop workload optimization thread.

Hi guys.

I haven't seen any thread anywhere with the purpose stated in the title, albeit I've seen lots of hints scattered around in different posts. That's why I think there's potential for this kind of thread.

This is of course not a ZFS/server optimization thread -- there's plenty of that.

If I'm mistaking, please feel free to post a link the thread/guide.

The scenario I'm shooting for:

The user has already installed and enjoyed FreeBSD, Xorg and Desktop environment.

The user enjoys all the typical desktop things be it:
Graphical work, watching movies, playing native games and/or wine games etc.

The tinker within that user wants to play around with sysctl and/or other features, but need guidance as to what knobs to turn.. probably already has turned on a few knobs, perhaps even things that didn't matter.

Got any advice? Post it here.

I'll of course share what I've found so far (not alot!):

1. Edited make.conf to include CPUTYPE= and CFLAGS= -02 -pipe
I was told the compilation could be optimized by defining those values, making "better/smoother" binaries.

2. I found a post stating the following sysctl values (added to sysctl.conf)
Code:
# Enhance shared memory X11 interface
kern.ipc.shmmax=67108864
kern.ipc.shmall=32768

3. In another post I found this sysctl value:
Code:
# Increase desktop responsiveness under high CPU use (200/224)
kern.sched.preempt_thresh=224
That's all I got so far.

Some might be completely pissed off by the fact, that I'm just blurping out stuff I quite honestly don't understand on any sort of deeper level. Yeh, sorry about that, but it's meant to be a general purpose tinkering/optimization thread where users can find inspiration to try different things. Not a benchmark of each proposed option. If this is completely against forum etiquette, please point me in the right direction.

NB:
Some things could be proposed, that are harmful to the system, so don't forget the common sense.
 
kronisk said:
1. Edited make.conf to include CPUTYPE= and CFLAGS= -02 -pipe
I was told the compilation could be optimized by defining those values, making "better/smoother" binaries.
Only set CPUTYPE, don't play around with CFLAGS, the defaults are optimized enough.
 
Hmmm, let's see.

First, don't set CFLAGS globaly - disabling it will safe you more time bothering why things break than enabling saves runtime: -march=native -O3 -floop-... will result in like 10% run time improvement in average on the price of 50% object code size and compile time with gcc-4.4/4.5 (on gentoo linux, never tried it on FreeBSD but i think it'll be the same here), so in order to gain that 10% speed advantage you first have to invest alot more then 10% more compilation time and loading programs from disk also is slow. Furthermore, gcc tends to have it's bugs, and the software being compiled also does, so you will likely have to recompile some packages if you don't compile them with the 'default' flags that trigger the 'right' gcc misbehavior - ah, and you'll lose "support" for FreeBSD if you played with CFLAGS for buildworld/kernel.

Instead, override build tools and flags only for individual ports that you want to optimize more, and can live with them having broken - you can live with broken mplayer, as in the worst case you'll just recompile it taking a few minutes, but you don't want to live with a broken ... libc:

Code:
.if ${.CURDIR:M*/multimedia/ffmpeg*} || ${.CURDIR:M*/multimedia/vlc*}
  CC=gcc45
  CXX=g++45
#  CFLAGS=-O1 -g
#  WITH_DEBUG=YES
.endif

By the way, -pipe is enabled by default on gcc-4, and -O2 is enabled for next to every port (and those who are not enabling it know why!)

Enable soft-updates on any UFS partition that is being written to regularily (enabled by default). Your filesystem access will get faster by doing it, and it'll also help you keeping the filesystem consistent on crashes (which safes you a _lot_ of time). Next, mount any filesystems that are rarely written to (most probably all but the home directory's filesystem and the one containing /var) with readonly access, and disable access time stamping on the write-allowed mounts unless it's needed.

Sacrifice some memory and put /tmp and /var/tmp (just symlink one to the other) on tmpfs. kioslave (kdelib's IO helpers) likes using /tmp for buffers, and it's certainly not the only thing that does. Furthermore if memory gets filled and used by other things, tmpfs can be swapped to virtual memory if not needed to be persistent.

Disable some enabled-by-default-yet-useless-to-you features of your desktop environment. KDE's strigi file indexer for example, will do nothing more then waste time (and produce noice with all it's disk accesses) to most of the KDE users. The most obvious point of tuning might not be FreeBSD related, but it'll still be the most efficient.

The mplayer version from ports seems not to split its work over multiple CPUs (efficiently). VLC does. When playing expensive videos such as BluRays you will notice the difference (if you have a smp system). Also, the VLC backend to kde/qt's phonon is more capable than gstreamer, so you probably will have vlc installed anyways if you have kde.

Atom/ION friends might want to disable all kind of graphical gizmos to begin with. But also, instead of setting up a ridiculous let's say 1920*1080 or 2048*1536 screen and increasing the font size, just leave the screen resolution low and don't increase the font size. Common sense you might think, but I've seen this quite often.

I think the SHM tuning might be interresting... can anyone doing it benchmark it?
 
1. Edited make.conf to include CPUTYPE= and CFLAGS= -02 -pipe
I was told the compilation could be optimized by defining those values, making "better/smoother" binaries.
The whole FreeBSD 'ecosystem' (both base system and pakcages) are compiled with these: -O2 -fno-strict-aliasing -pipe as defaults, so You will NOT gain anything, its useless.

I also use these on my 'graphical' systems on /boot/loader.conf file:
Code:
# boot delay
autoboot_delay=1
beastie_disable=YES

# firefox HTML5 fix
sem_load=YES

# page share factor per proc
vm.pmap.shpgperproc=512

# open files
kern.maxfiles=16384
kern.maxfilesperproc=8192

# avoid additional 128 interrupts per second per core
hint.atrtc.0.clock=0

# do not power devices without driver
hw.pci.do_power_nodriver=3

# reduce sound generated interrupts
hint.pcm.0.buffersize=65536
hint.pcm.1.buffersize=65536
hint.pcm.2.buffersize=65536
hw.snd.feeder_buffersize=65536
hw.snd.latency=7

# ahci power management
# check: dmesg | grep ahcich
hint.ahcich.0.pm_level=5
hint.ahcich.1.pm_level=5
hint.ahcich.2.pm_level=5
hint.ahcich.3.pm_level=5
 
@vermaden as far as I know, your SEM line is redundant.

Code:
options         P1003_1B_SEMAPHORES     # POSIX-style semaphores

Using CPUTYPE is prone to failure in some circumstances and the performance gain is almost nothing on modern CPUs (apart from ancient CPU-designs like Intel Atom).

If you're using UFS try something like this:

Code:
vfs.read_max=32
vfs.ufs.dirhash_maxmem=134217728

The defaults of FreeBSD aren't up to date anymore.
 
I decided to add most of your loader.conf options just to try it out. Then I wanted to check out if the values were actually changed, and some of them weren't. For instance vm.pmap.shpgperproc=512. Does it change for you? Does this mean that it's an option to compile the kernel with or?
 
Are you experiencing some kind of slowdown? If so, then start optimizing UFS first. I wouldn't experiment with disputable options just for fun.
 
I'm not experiencing slowdown, on the contrary. Everything runs great (had some stuttering in SC2, gone now)

The "hint" entries you have in your loader.conf should be in device.hints.. at least in 8.1
 
oliverh said:
Using CPUTYPE is prone to failure in some circumstances and the performance gain is almost nothing on modern CPUs (apart from ancient CPU-designs like Intel Atom).

Performance gain is absolutely nothing in nearly every case since the default is native which is gcc autodetecting your CPU type. You can screw things up if you set it wrong, but you can't gain anything by setting it correctly.
 
I've put
Code:
vfs.read_max=128
in my sysctl.conf, but after every reboot it switched back to the default 8. How can I make this setting permanent?
 
MarcoB said:
So why is it changed back to 8 every time it reboots?
Only your system can answer that. Usually it's due to something like a misspelling, or some offending character occurring earlier in the file. Booting verbose may help identify the issue.
 
It's one of those values you can change on the fly.. So it's easy to check misspellings.
The format of sysctl.conf is

Code:
vfs.read_max=128

and not

Code:
sysctl vfs.read_max=128

.. if you haven't made either of those 2 mistakes I agree it's a bit strange^^
 
kronisk said:
I'm not experiencing slowdown, on the contrary. Everything runs great (had some stuttering in SC2, gone now)

The "hint" entries you have in your loader.conf should be in device.hints.. at least in 8.1

/boot/device.hints should be treated like a defaults file. Put your overrides into /boot/loader.conf. device.hints is part of the OS install and will be overwritten by installworld/mergemaster unless you are very careful. loader.conf is a user-managed file.
 
wblock said:
It may not be changing at all. Do you have a linefeed at the end of the sysctl.conf line?

Thank you mr. Block, it was a linefeed thing. Didn't know this would make a difference, it wasn't misspelled or something. I changed it on the fly but changed back to 8 at every reboot (@kronisk). Now I'm curious if my system is flying even faster with this setting :)
 
MarcoB said:
Thank you mr. Block, it was a linefeed thing. Didn't know this would make a difference, it wasn't misspelled or something. I changed it on the fly but changed back to 8 at every reboot (@kronisk). Now I'm curious if my system is flying even faster with this setting :)

benchmarks/bonnie++ will let you test. My system did about 121M/sec with vfs.read_max=32, and 0.2% slower with vfs.read_max=128; statistically insignificant. But that's probably as fast as this single drive can read.

It's a good example. A lot of optimizations only make a difference on certain hardware, and don't help or actually slow down on other hardware. Benchmarking is the only way to know for sure.
 
Thank you to everybody who participated in this thread as of yet.

I have now landed on a configuration which I'm very pleased with, and I'm going to post my sysctl.conf and loader.conf hoping someone will find it useful.

I've also compiled a custom kernel with some options from the PC-BSD kernel (in their wiki they said: "PC-BSD’s kernel has been recompiled with some configuration tweaks to better suit it for desktop use" -- additionally I've stripped the kernel of all modules I'm not using, rendering it quite small.

One should keep in mind, that the values I've selected reflect some trial and error on my particular setup, and it might not work so great for you. My setup is used for a Desktop workload, namely running StarCraft 2 in wine, watching movies yadayada.

sysctl.conf:

Code:
# $FreeBSD: src/etc/sysctl.conf,v 1.8.34.1.4.1 2010/06/14 02:09:06 kensmith Exp $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0

# Description from tweaks obtained by "sysctl -ad"

# File system tweak
 #Maximum amount of space to use for in-progress I/O
 vfs.hirunningspace=10485760
	
 #Maximum allowed dirhash memory usage
 vfs.ufs.dirhash_maxmem=134217728

 #Cluster read-ahead max block count
 vfs.read_max=128

######################################################

# Kernel tweak
 #Min priority for preemption, lower priorities have greater precedence
 kern.sched.preempt_thresh=200

 #Maximum files allowed open per process
 kern.maxfilesperproc=8192

 #Maximum shared memory segment size
 kern.ipc.shmmax=67108864

 #Maximum number of pages available for shared memory
 #One page is 4096 bytes, thus 256*(256*4096) = 256MB
 kern.ipc.shmall=65536

 #Enable/Disable locking of shared memory pages in core
 kern.ipc.shm_use_phys=1

 #Enable/Disable attachment to attached segments marked for removal
 kern.ipc.shm_allow_removed=1

########################################################

# Sound tweak
 #linux mmap compatibility (-1=force disable 0=auto 1=force enable)
 hw.snd.compat_linux_mmap=1

########################################################

# Virtual memory tweak
 # See tuning(7)
 vm.overcommit=2

... and loader.conf:

Code:
nvidia_load="YES"
sem_load="YES"

# Kernel tweaks
 #Pipe KVA limit, see tuning(7) for more info.
kern.ipc.maxpipekva=25165824
 
 #Maximum number of files
 kern.maxfiles=16384

 #Number of segments per process
 kern.ipc.shmseg=256

##################################################

# Virtual memory tweak
 #Are large page mappings enabled?
 vm.pmap.pg_ps_enabled=1
 
 #Page share factor per proc
 vm.pmap.shpgperproc=512

##################################################

# Misc. tweak

# Sound interruption reduction
hw.snd.feeder_eq_exact_rate=48000
hw.snd.latency=7
hint.pcm.0.buffersize=65536

# Avoid additional 128 interrupts per second per core
hint.atrtc.0.clock=0
 
wblock said:
After changing that readmax setting, I could swear buildworlds are going faster. But the actual times show it's about the same.

I was thinking rather about responsiveness, especially under load, that is hard to measure, and its very subjective.
 
wblock said:
Careful: Hawthorne effect.

After changing that readmax setting, I could swear buildworlds are going faster. But the actual times show it's about the same.

I see your point, but one effect of my customization is indisputable:
When I played StarCraft 2 from a clean install, there was a 2 second spike (which seemed sound generated, but I'm not so sure now) whenever a building finished. Those are like 90% gone now, but when they occur they are well under 1000ms!

I also tried commenting out all my changes, reboot and play a game of starcraft 2, and the spikes returned. What has the biggest impact were the shared memory tweaks. I did not go about the commenting/uncommenting scientifically as I chose settings in chunks I felt like.

As for the rest of the system (compiling, browsing etc.) it's not noticeably faster -- heck, it was running superb even before I started customizing.

I will try some benchmark later today. Can you recommend any? Perhaps some of my description above inspired you to say: Ye, THAT one!^^
 
Back
Top