ports-mgmt/synth: Upgrade version 1.43 => 1.50

Code:
This release improves robustness and activates the watchdog.
It leverages the procctl functionality to ensure all processes spawned
from a builder are reaped, which in turn ensures that tmpfs mounts can
be dismounted.  Previously stuck processes could prevent those dismounts,
trapping them as new mounts get placed on top.

This also finally enables the watchdog that will kill runaway builds.
The watchdog has a specific time limit per build phase where it will
kill the build if the log doesn't grow over the previous X minutes.

No activity timeout limits per phase are:

  check_sanity    :   1 minute
  pkg_depends     :   3 minutes
  fetch           : 480 minutes
  checksum        : 480 minutes (fetches if required)
  extract_depends :   3 minutes
  extract         :  30 minutes
  patch_depends   :   3 minutes
  patch           :   3 minutes
  build_depends   :   5 minutes
  build           :  20 minutes
  run_depends     :  10 minutes
  stage           :  20 minutes
  check_plist     :   3 minutes
  pkg_package     : 120 minutes
  install_mtree   :   3 minutes
  install         :  10 minutes
  deinstall       :  10 minutes

A minor change regarding the swap display: If there is no swap installed,
it will now display "n/a" instead of "100%"
 
Hi, is there a way to configure / tune this time limits (apart from changing the code)?
I just had observed that my devel/llvm37 build phase did exceed the limits on a busy machine.
Thanks & kind regards,
Matthias
 
No. Are you using v1.51 ?
Chances are the machine is overloaded if there's no output in 20 - 60 minutes.
What's the ncpu, how much RAM do you have, and what's your #builders and #jobs per builder?

I suspect your ram is low for the #builders + #jobs per builder and your machine is swapping hard.

FYI, I may bump the time limits of build and stage phases by 20% for usual suspects.
 
Hi John, thanks for the quick response. The setup is as follows:

* AMD FX(tm)-6300 Six-Core Processor (3515.62-MHz K8-class CPU)
* 16 GB RAM
* Number_of_builders= 4
* Max_jobs_per_builder= 3
* FreeBSD 11-RELEASE
* ZFS root
* Swap 16 GB, but not used
* Synth 1.51

Do you recommend reducing the number of builders?
Furthermore, does the 20 min limit mean that the watchdog kills the process if the whole build phase takes 20 min, or if the build log doesn't get a new line in 20 min?

Best regards,
Matthias
 
wow, it looks like if anything your setup is conservative. You shouldn't be having any issues.
"20 minutes" in this case means that the log wasn't incremented over a 20 minute period. If it gets a single new line, the timer resets.
That is just the base time. If your system is loaded (e.g. 5-minute average is > 12 with a ncpu=6 system) then that time limit grows (to 40 minutes in that example).

honestly I think your configuration is fine. It's hard to imagine your machine being so loaded that it tripped the watchdog.

by the way, check synth version to make sure you're on v1.51 and not v1.50.
 
Hi John,

yes - I am using Synth 1.51. I did some try to further isolate the issue. I set the number of builders to 1 and started another build run. Even with one builder devel/llvm37 build fails with the timeout. Attached you find the log file - unfortunately there are not too much timestamps in it, but maybe it gives some hint?

Best regards,
Matthias
 

Attachments

  • devel___llvm37.log.zip
    84.9 KB · Views: 175
it doesn't make any sense. Having a 16Gb 6-core machine using -j3 shouldn't have trouble compiling a single c++ file like that.
If you want to turn the watchdog off, you can rebuild synth by changing src/portscan-buildcycle.adb line 704, change "hangmonitor" variable from True to False.

maybe it really does take that long to compile c++ files on K8's regardless of available memory...
 
Hello John, thanks for these helpful hint. Turning off the watchdog will be my next step. Before that, I'll do a supervised build without Synth and do a timestamped logging of the output. Would be interesting as well what performs so bad on my System.

Best regards,
Matthias
 
Hello John,

to my surprise, when I perform the build of devel/llvm37 via

Code:
make -DBATCH | gawk '{ print strftime("[%Y-%m-%d %H:%M:%S]"), $0 }' > /home/admin/devel___llvm37.build.log 2>&1

the whole build takes less then 30 minutes, with no single line exceeding the 20 minutes limit. Do you have an idea what I could try next?

Best regards,
Matthias
 

Attachments

  • devel___llvm37.build.log.zip
    202.1 KB · Views: 163
I've no idea beyond turning off the hangmonitor and just trying to build it in Synth again. Maybe observe during the build and see if you can see where it's getting hung up. Maybe it really is hanging and the watchdog is doing its job.
 
Hi John, after turning off the watchdog as described, the build of llvm37 was completed after about an hour. Unfortunately I did change the Number_of_builders= back to 4 before, so the measure is not too reliable.
Anyway - once my batch build is done, I will trigger another build with Number_of_builders= 1. My expectation is that it will be completed after ~30 minutes which was the time it took when building without synth just from the ports.

Best regards,
Matthias
 
you expect -j1 to take 30 minutes while -j4 takes twice as long? Usually the opposite occurs.

It still doesn't explain the 20+ minute stall in the middle of the build...

What would really help is leaving the watchdog on and changing line 1146 on /src/portscan-buildcycle.adb from 20 to 25.
(and if that fails from 20 to 30).

I'm trying to figure out what to loosen the limits to avoid false positives but still be reasonable.
 
to clarify, the number of builders doesn't matter if it's just one package, it's the number of jobs per builder that affects the build time (again, assuming it's the only thing building).
 
Hello John,

thanks for the clarification. Actually we are on the same page.
I meant that with Number_of_builders=1 I did ensure that only one port is built at once (as I am building with a port.list).
Before my last test, I did set it back to 4, and that's why I think the 1 hour is ok, as there were parallel builds of other ports.
I will follow your recommendations to drill down to the root cause.

Best regards,
Matthias
 
Yes, will consider that when I have my results.
Did you thought about having this limits dynamically calculated, ie. by taking account of some environmental conditions (CPU, IO bandwith)... just an idea - not sure if this can be realistically achieved.
In any case I'd find it useful to have the limits configuration exposed as environment settings / config in the synth.ini so that they can be tweaked without rebuilding synth.

Best regards,
Matthias
 
it is dynamically calculated.
20 minutes is the BASE limit.
If the machine is loaded, that could have a multiplier from 1.1 => 5 . (22 to 100 minutes)

The issue is that this multiplier doesn't get applied until the machine is already loaded.


In any case I'd find it useful to have the limits configuration exposed as environment settings / config in the synth.ini so that they can be tweaked without rebuilding synth.

THis is exactly what I'm trying to avoid. Too many options. It confuses users.
 
there's cues that this watchdog issue only happens when building in text mode (aka not ncurses mode). It may be a bug. Stand by ...
 
mpetersma@ version 1.52 should solve your use case

Code:
ports-mgmt/synth: Upgrade version 1.51 => 1.52

Fix regression in text-mode caused by activation of watchdog.
The watchdog is checking the lengths of the build logs to figure out if
a builder has stalled.  It turns out that the logs were only being
inspected in ncurses display mode, so any port that took longer than
20 minutes to build would be aborted by the watchdog.

While here, bump the *BASE* time limit for the build phase from 20 to
25 minutes based on extreme causes (normally involving gcc or tex ports)
and also bump the check-plist phase limit from 3 minutes to 10 minutes.
Some ports have tens of thousands of files in them which takes a long
time to check under test mode, especially if the server is loaded.
 
Some people might welcome this point release:

Code:
ports-mgmt/synth: Upgrade version 1.52 => 1.53

Major bug fix: ncurses display resize hang fixed

  Until now, resizing the window why synth is running in ncurses mode
  caused synth to hang (it would finish the builds it was working on
  but the display wouldn't update and no new jobs would start).  This
  was due to an unhandled exception thown by ncurses binding as a result
  of the resize event, and now these are handled.

Minor fix: Ports with @info in pkg-plist now pass in test mode

  The mtree exclusion file was improved to allow these leftover info
  directories to be ignored (as is done in poudriere.  Before only
  info/dir was ignored, but the presence of "dir" prevented "info" from
  being removed by pkg(8) upon deinstallation.

enhancement: Augment text mode (requested)

  Now when a builder starts on a new package, the port origin will be
  shown in the running log (before only the completion was logged.)
 
Thanks so much for getting to the bottom of this. I quite often start a build on my laptop and then check up on it on my phone or vice versa. The two different size of screens quite often caused this hang. Nice to see it's fixed now.
 
There were still some quirks with resizing including more possible hangs, but hopefully that's really been addressed now:
Code:
ports-mgmt/synth: Upgrade version 1.53 => 1.54

Handles remaining resizing exceptions and improves display handling.

Yesterday's work handled most of the common display exceptions, but others
were still possible.  Now all possible exceptions are handled.

Several improvements were made to the display:
  1) lines no longer wrap if the size width is resized too narrow; they
     get truncated as always intended
  2) Elements such as the elapse timer don't get displayed in the wrong
     place when the screen is too narrow (they just don't show)
  3) The dashes now get restored if the screen is sized small and then
     big again (or started small and then expanded).  In many cases those
     lines just never came back before.
  4) The "full" refresh frequency was increased a period of 30 seconds to
     a period of 4 seconds.  This has a side benefit to text-mode watchdog
     as well since that's the same timer for the log inspection.
  5) The history window height ranges from 10 to 50 rows.  If the xterm
     window starts small, the history will be 10 lines.  If it starts
     big, the number of lines will be dictated by the original size of
     the xterm window.  Making the screen small and then bigger again will
     reveal the full number of log lines.
 
1.53 worked fine for me, but 1.54 causes the ncurses screen to be completely garbled and unreadable. It's not a tmux thing, it does it just in a plain terminal as well. Using PuTTY on windows as the terminal (112x34)
synth.png
 
hmm, i just saw this on FreeBSD console in virtualbox vm. what's going on there?
It seems to be specific to freebsd. (I see it in bitvise too)
 
Back
Top