ports-mgmt/poudriere-devel is running here on a 8 cpu system with 8GB RAM and 8 GB zfs-swap.
In the past weeks Poudriere didn't complete it's run anymore. It makes the system nonreactive at local console. Only an unfriendly use of the system reset can restart the system.
A look on the Poudriere generated log files gives no clue for the problem. Poudriere failes every time in the same i386-jail-set. But it is not predictable when and how it happens. To get some 600 ports bulk run completed it needs 3-5 manual system restarts.
I saw swapdisk usage at a maximum of about 60% but saw the usage also on a lower percentage failing.
So the question is how Poudriere could be configured without triggering this problem?
Currently it cannot be used automated anymore here as it fails each time, leaving the system unusable.
In the past weeks Poudriere didn't complete it's run anymore. It makes the system nonreactive at local console. Only an unfriendly use of the system reset can restart the system.
A look on the Poudriere generated log files gives no clue for the problem. Poudriere failes every time in the same i386-jail-set. But it is not predictable when and how it happens. To get some 600 ports bulk run completed it needs 3-5 manual system restarts.
I saw swapdisk usage at a maximum of about 60% but saw the usage also on a lower percentage failing.
https://github.com/freebsd/poudriere/wiki/todo said:Stability
There is substantial risk that large ports build at once and consume all RAM/swap and cause a OOM or panic. Either need to make the queue wait on these known large ones or monitor the amount of remaining memory and current CPU load and delay builds while high. Note that hidden in this task is reworking the queue to allow delaying builds. This is not possible currently and conflicts with detecting a stuck/deadlocked queue. A more flexible queue would allow retrying fetch failures or failed builds (due to memory constraints).
So the question is how Poudriere could be configured without triggering this problem?
Currently it cannot be used automated anymore here as it fails each time, leaving the system unusable.