poudriere never completes first run without errors

Hi,
I'm using poudriere to build some ports where i like to alter the default options (e.g. firefox). When i then start poudriere to build the whole packges, it does never just build them. It almost always comes all the way down to build at least one (or more) llvm-versions and rust simultaniously, and thats the point where the build of one of those ports fails (most cases rust failes to build). The log shows that it got a signal 9 (SIGKILL) but i cannot figure a reason for that. If i start pudriere once again (llvm has finished without error in most cases), it will build the remaining packages (including rust) without failing.
I assume it has something to do with the simultaneous compilations of those heavyweight ports llvm and rust. My build-server is a HP-Z400 workstation with 12 Cores and 16GByte RAM + 10GByte swap (which is in most cases only filled with some MB).

Any ideas what causes the abort of those builds? Could the jail hit some fd-limitations?
 
Hm...
dmesg shows me that line:

Code:
pid 23078 (fabricate), jid 6, uid 65534, was killed: out of swap space

But i do have 10GByte swap space which is barely touched. How come?
 
Building rust is _very_ resource heavy, I think you need 8Gb+ RAM (16Gb is recommended) to compile Rust these days (for less than 4 jobs).
 
Building rust is _very_ resource heavy
It is. This doesn't improve when poudriere tries to build one or more versions of llvm at the same time. Then things get really messy and suck up a LOT of resources.
 
I do now understand that it is really heavy work for the buildservers to build llvm and rust, especially simultaniously. On the other hand was I under the impression, i would provide some hardware to lift some of the heavy weigth (16GB RAM + 9GB swap). Are there any tips where i could tune some bits to keep the build running (even if it takes longer to finish the build)? It is a bit annoying to start a build in the evening to have them built overnight just to see that it was aborted two hours after start, leaving a huge chunk of ports in the skipped list. Even rust alone is sometimes aborted now :(
 
Do you have ALLOW_MAKE_JOBS enabled in your poudriere.conf? Have you set MAKE_JOBS_NUMBER in your make.conf?

Edit: Are you using ccache(1)?
 
T-Daemon
I've found an older 160GByte HDD lying around and added it as swap device. Let's see if this helps...

Jose
Yes, i do use ALLOW_MAKE_JOBS, but let the default value for MAKE_JOBS_NUMBER untouched.

I've had used ccache, but that did not work well for me. It did lock up the whole machine after switching FreeBSD Release and rebuilding the packages without deleting the ccache-folder first.
 
Do you have ALLOW_MAKE_JOBS enabled in your poudriere.conf?
Rust seems to completely ignore this and use all cores regardless. As far as I know this is intentional. Even with all cores it takes forever to build, if it would only use one core we'd be talking about build-times in days instead of hours.
 
Yes, i do use ALLOW_MAKE_JOBS, but let the default value for MAKE_JOBS_NUMBER untouched.
This didn't work for me. Builds would have lots of workers killed by the kernel, and would usually eventually die. The limiting factor is memory. I only configured 4GB of swap when I built this machine, a decision I regret.

There are lots of suggestions in this thread:

You could try using ALLOW_MAKE_JOBS_PACKAGES as Sirdice suggests in that thread. I didn't want to maintain the list of ports that are allowed to run in parallel.

By default, both poudriere(8) and the Ports system use sysctl(8) to determine how many tasks to run in parallel. They use slightly different MIBs, but these report the same number on my system, which works out to two times the number of cores because each core has two threads.

So I wound up with 32 poudriere(8) workers each possibly running 32 make(1) tasks. I never quite hit 1024 processes, probably because a lot of ports don't invoke parallel make(1), and because most builds have significant non-parallel stages (automake comes to mind). My load average did break 100, and I did get lots of killed processes in my dmesg(8), though. The problem was memory starvation. The system was still semi-responsive.

I tuned the parallelism of my builds because I didn't want to curate a list of blessed ports, and I didn't want to have 15 cores sitting idle most of the time. The approach I came up with was to set PARALLEL_JOBS=16 in poudriere.conf and MAKE_JOBS_NUMBER=16 in make.conf. In practice my load average never hits 256, but works out to a max of 64 or so for the particular set of ports I build, and mostly hovers around 16 during the build. I still get some OOM kills, but the poudriere(8) runs usually finish if I quit Thunderbird, and don't open too many tabs in Firefox.

I wouldn't copy these numbers into your configuration files. They probably only work for my hardware and workload.

I've had used ccache, but that did not work well for me. It did lock up the whole machine after switching FreeBSD Release and rebuilding the packages without deleting the ccache-folder first.
That's too bad, ccache(1) made a huge difference for me, especially when building multiple versions of LLVM at the same time.

Rust seems to completely ignore this and use all cores regardless. As far as I know this is intentional. Even with all cores it takes forever to build, if it would only use one core we'd be talking about build-times in days instead of hours.
I've noticed this too, but it seems to me Rust is running multiple threads instead of processes. Still obnoxious, but more memory-efficient.

Edit to add
I've found an older 160GByte HDD lying around and added it as swap device. Let's see if this helps...
This is almost certainly a bad idea. A 160 GB HDD is likely truly ancient and therefore glacially slow. My guess is you'll trade OOM kills for timeouts.
 
Do you have USE_TMPFS (not sure of the name) in poudriere.conf?
Good question. Using tmpfs(5) will cause memory contention. I have USE_TMPFS=data. I figured this was the best combination for speed and efficient use of memory.

Code:
# Use tmpfs(5)
# This can be a space-separated list of options:
# wrkdir    - Use tmpfs(5) for port building WRKDIRPREFIX
# data      - Use tmpfs(5) for poudriere cache/temp build data
# localbase - Use tmpfs(5) for LOCALBASE (installing ports for packaging/testing)
# all       - Run the entire build in memory, including builder jails.
# yes       - Enables tmpfs(5) for wrkdir and data
# no        - Disable use of tmpfs(5)
# EXAMPLE: USE_TMPFS="wrkdir data"
The default is "yes" (wrkdir and data). Maybe I should just turn it off since I have an NVMe drive.
 
Thinking more on this, it would be nice if you could give Poudriere a target load average and it could tune its parallelism accordingly. Or maybe a max load average and have it cut back on the number of workers as it approached this limit.
 
Run ports-mgmt/poudriere with just one job for those problematic things first. Later run the whole build you need with the optimizations[1].

poudriere bulk -r -t -J 1:32 -j JAILNAME lang/rust devel/llvm80 devel/llvm90 devel/llvm10 lang/gcc9

Adjust for your needs...

[1] all in the same jail, of course.
 
But i do have 10GByte swap space which is barely touched. How come?
With 16GB RAM this is hard to believe the swap is barely touched. With Rust alone it would use at least a couple of GBs provided you are not using ZFS or has ZFS_ARC set to a minimal (or you have just restarted the computer).

Are you measuring the swap usage during the whole process, how?
 
poudriere bulk -r -t -J 1:32 -j JAILNAME lang/rust devel/llvm80 devel/llvm90 devel/llvm10 lang/gcc9
The man page is not terribly clear, but it seems that -t will disable parallel make as well. This, plus running only one Poudriere worker means at most one core will be used, except maybe by Rust which cheats. It would not be appropriate for a machine with more than a few cores.

Why the large number of premake jobs?
 
You can also try THIS version, which considerably improve the building time; however IDK if it was already merged or not (I've have not been following what is going on ultimately).
 
Are you measuring the swap usage during the whole process, how?
I have to admit that this has been done unprofessionally by keeping an "observing eye"™ ;) on an ssh-session running top(1). On the build server I'm using ZFS and ARC_MAX is set to 2GByte via /boot/loader.conf.
Do you have USE_TMPFS (not sure of the name) in poudriere.conf?
I did not alter the default, which is USE_TMPFS=yes.
Run ports-mgmt/poudriere with just one job for those problematic things first. Later run the whole build you need with the optimizations[1].

poudriere bulk -r -t -J 1:32 -j JAILNAME lang/rust devel/llvm80 devel/llvm90 devel/llvm10 lang/gcc9

Adjust for your needs...
I will try this for my next builds, thank you.
 
I also have 16GB of RAM, and have set a 16GB swap. Currently started a build around 10:00 this morning. Swap is barely touched during build (one of the jobs is currently building Rust). I do have TMPFS enabled.

Sidenote: I'm using ports-mgmt/poudriere-devel, that may be important.
 

Attachments

  • chart2.php.png
    chart2.php.png
    53.6 KB · Views: 122
  • chart3.php.png
    chart3.php.png
    54 KB · Views: 113
  • chart4.php.png
    chart4.php.png
    55.9 KB · Views: 103
Is there a way to setup poudriere so it won't build rust, llvm9, llvm10 and other heavy weight ports?

I'm not changing any configuration of this 3 ports, I am only building others ports which may have this ones as dependencies, and so they get built… To me it would be OK to use the binary packages of rust or llvm*.

My poor server has "only" 16GB of RAM and plenty of free swap space, but the CPU isn't very powerful, so this ports takes between 4 and 6 hours each time they are modified. Sometimes ccache helps, but not always.
As SirDice said, the worst is when the three are building at the same time :-/
 
So far, so good. Yes, I get double or even triple load (building Rust, LLVM, etc. at the same time) every now and then and it takes forever. But it does build without issues, and at least it doesn't crash the server any more. I do get failures sometimes and a bunch of ports being skipped because of it. I build often enough and restart builds frequently anyway, so it doesn't really bother me.
 
Back
Top