Poudriere exiting with status 1 and VM aborting

I have a VirtualBox VM which is launched every morning to run the following cron job:
Code:
48 5 * * * /usr/sbin/pkg upgrade -y && /usr/local/bin/poudriere ports -u -p local && /usr/local/bin/poudriere bulk -j 10buildi386 -p local -z kjpservers -f /usr/local/etc/poudriere.d/10buildi386-local-kjpservers-pkglist && /home/xxxx/FileProc/checkshut
.

This has worked well for nearly a year.

In the last week or so, two symptoms have arisen: either the VM is aborted while poudriere is running, or poudriere completes but exits with a status of 1, meaning the VM does not shut down (checkshut is a script which checks I am not logged in and then shuts the machine down).

This is the tail of a VirtualBox log of an aborted session, if anyone can throw light on it (I'd ask at the VB forum but Oracle want me to agree to the sharing of my personal data as a condition of logging on, and I don't agree):
Code:
00:13:33.540426 PIIX3 ATA: Ctl#0: ABORT DMA
00:13:58.824588 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:12.151959 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:36.597759 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:36.598727 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:40.298374 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:40.301380 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:40.580826 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:40.663247 PIIX3 ATA: Ctl#0: ABORT DMA
00:26:48.532197 PIIX3 ATA: Ctl#0: ABORT DMA
00:27:15.625995 PIIX3 ATA: Ctl#0: ABORT DMA
00:27:15.631662 PIIX3 ATA: Ctl#0: ABORT DMA
00:27:15.762296 PIIX3 ATA: Ctl#0: ABORT DMA
00:28:55.779336 PIIX3 IDE: guest issued command 0xca while controller busy
00:28:55.782303 
00:28:55.782304 !!Assertion Failed!!
00:28:55.782305 Expression: ReqType == ATA_AIO_RESET_ASSERTED || ReqType == ATA_AIO_RESET_CLEARED || ReqType == ATA_AIO_ABORT || pCtl->uAsyncIOState == ReqType
00:28:55.782305 Location  : /home/vbox/tinderbox/lnx64-rel/src/VBox/Devices/Storage/DevATA.cpp(5516) int ataR3AsyncIOThread(RTTHREADINT*, void*)
00:28:55.782340 I/O state inconsistent: state=0 request=1
00:28:55.785599 PIIX3 IDE: guest issued command 0xca while controller busy

and this is what a manual poudriere session looks like:
Code:
# /usr/local/bin/poudriere bulk -j 10buildi386 -p local -z kjpservers -f /usr/local/etc/poudriere.d/10buildi386-local-kjpservers-pkglist || echo $?
[00:00:00] ====>> Creating the reference jail... done
[00:00:04] ====>> Mounting system devices for 10buildi386-local-kjpservers
[00:00:04] ====>> Mounting ports/packages/distfiles
[00:00:04] ====>> Using packages from previously failed build
[00:00:04] ====>> Mounting packages from: /usr/local/poudriere/data/packages/10buildi386-local-kjpservers
[00:00:04] ====>> Copying /var/db/ports from: /usr/local/etc/poudriere.d/10buildi386-kjpservers-options
[00:00:04] ====>> Appending to make.conf: /usr/local/etc/poudriere.d/10buildi386-local-kjpservers-make.conf
/etc/resolv.conf -> /usr/local/poudriere/data/.m/10buildi386-local-kjpservers/ref/etc/resolv.conf
[00:00:04] ====>> Starting jail 10buildi386-local-kjpservers
[00:00:04] ====>> Logs: /usr/local/poudriere/data/logs/bulk/10buildi386-local-kjpservers/2016-09-27_12h48m26s
[00:00:04] ====>> Loading MOVED
[00:00:04] ====>> Calculating ports order and dependencies
[00:00:09] ====>> Sanity checking the repository
[00:00:09] ====>> Checking packages for incremental rebuild needed
[00:00:23] ====>> Cleaning up
[00:00:24] ====>> Umounting file systems
1

Any information would be much appreciated.

Thanks.
 
It looks like the VM has issues with its disks. How are those disks connected? Are they local images or iSCSI (the ctl hints at this)? Are the disk images still in good shape? Not corrupt or anything? Is the disk where the images are stored on still in good shape (i.e. bad sectors)?
 
The host runs PCLinuxOS. I have just checked its /var/log/messages and found:
Code:
Sep 27 06:10:07 master klogd: nspr-2[2711]: segfault at 35 ip 0000000000000035 sp 00007f479d24bcd8 error 14 in VBoxSVC[400000+489000]
so it looks as if there's a problem with VB on the host. I think that leaves the FreeBSD guest in the clear! At least I know which OS has the problem now!

Well, partially. That explains why the VM sometimes aborts. It doesn't explain why poudriere exits with 1.
 
Back
Top