Solved What is the ideal make jobs formula for multicores system?

  • Thread starter Deleted member 63539
  • Start date
D

Deleted member 63539

Guest
My CPU has 4 physical cores. Ninja's default make jobs is set to 6 and it caused out of memory problem when the compiler is GCC. By trial and error I found the suitable make jobs for GCC is 3.

So I come up with this formula: make jobs = physical cores - 1

Recently I build mulle-clang for playing with Objective C (I don't like the language much and about to remove it). I found the logic the developer used in his install-mulle-clang script is:

Bash:
if [ "${OPTION_PARALLEL}" = "NO" ]
      then
         MAKE_FLAGS="-j 1"
      else
         local cores
         local loadavg

         cores="`get_core_count`"
         loadavg="`expr $cores / 4`"
         loadavg="`expr $cores - ${loadavg}`"
         if [ "${loadavg}" -gt 0 ]
         then
            MAKE_FLAGS="-l ${loadavg}"
         fi
      fi

After evaluate it, the make jobs for my system is 3. Do you think that is the ideal make jobs formula for multicores system? And what is your own formula? :-/
 
There is none. This is also depends on available RAM and what exactly you are building. Small things should benefit of more jobs, while for large software it can make things worse because more jobs needs more memory, and if that make the build touch swap you lose performance.

In other words, trial and error.
 
There is none. This is also depends on available RAM and what exactly you are building. Small things should benefit of more jobs, while for large software it can make things worse because more jobs needs more memory, and if that make the build touch swap you lose performance.

In other words, trial and error.
I think you are right. Trial and error is the way to go.
 
I play a youtube video. When the sound get distorted i reduce the number.
A very practical attempt ;) I like that one!

Usually I use MAKE_JOBS=$(( $(sysctl -n hw.ncpu) + 1)) (sh(1) syntax, not csh(1)). That's ok on a dual core with SMT (4 threads), because many compile tasks on small source files are very fast, and the build might get I/O (disk) bound. It's not good when the application to build raises long compile tasks, e.g. LLVM. Then MAKE_JOBS=number of physical cores is more appropriate.
 
It also depends on the IO system, how good the throughput and latency of the file system are. Compilation is very IO intensive, as each step reads many files (source and header files, or object and libraries), does IO to temporary files, then writes a file. Caching is very important. So you need to find out what the bottleneck is, and make sure that bottleneck is kept completely fed.

On a single-node C++ compile with a good file system and sufficient memory, my rule of thumb used to be 4-8x the number of cores; that pretty much guarantees that there is always at least one runnable process for each core, and the bottleneck (CPU) is kept 100% busy. I actually measured it, and 1-2 compiles per core had lower performance, while more than 8 didn't help much. For distributed compiles on a large cluster (with a cluster file system), the bottleneck is the network, and I go down do 1-2 compiles per host (not per core!); any more just creates large queues of jobs on some nodes that clog the file system.

I'm sure there is not one formula, and it depends on the source code layout, fanout of files, what fraction of files are intermediate and need to be written and immediately read, and obviously memory and IO hardware. Trial and error is the correct answer, as is monitoring the bottlenecks.
 
Back
Top