cron daemon fails after random period of time

The cron daemon on one of the BIND DNS systems fails to continuously execute every 5 mins after a random length of time. The /etc/crontab has no customized edits as all customized cron jobs have been configured in /var/cron/tabs/root via [CMD=]crontab -e[/CMD] The cronjobs that do exist are as follow:
Code:
# Nightly achive of Bind DNS logs
1       0       *       *       *       /etc/rc.d/named restart
5       0       *       *       *       /root/nightlytasks

The nightlytasks cronjob does nothing fancy as seen below:
Code:
#!/bin/csh 
# Script to compress and archive logs from previous day

# Denotes start time of task
date >> /root/taskstatus
 
echo "Started Archiving & Compressing All Bind DNS Logs Within The Last 24 Hours" >> /root/taskstatus

# Compress and zip up all logs except the active log file
tar cvzf /var/named/var/archive/XXNETP02-`date '+%Y-%m-%d'`.tar.gz /var/named/var/log/named.log.* >> /root/taskstatus

echo "Finished Archiving and Compressing Bind DNS log files." >> /root/taskstatus

# Clean up the directory of old log files
rm -rf /var/named/var/log/named.log.*

echo "Finished with cleaning directory" >> /root/taskstatus

# List contents of directory to show task completed
ls -alt /var/named/var/archive/ >> /root/taskstatus
ls -alt /var/named/var/log/ >> /root/taskstatus

# Denotes end time of task
date >> /root/taskstatus

I check the /var/log/cron for errors but find none. I also check the /var/log/messages with zero success. I attempt to execute [CMD=]/etc/rc.d/cron stop[/CMD] followed by [CMD=]/etc/rc.d/cron status[/CMD] to verify it has stopped; then run [CMD=]/etc/rc.d/cron start[/CMD] to restart the daemon. Unfortunately, after anywhere from 15 mins to 1 day the daemon is unresponsive again and not executing either /root/nightlytasks or /etc/rc.d/named restart. The only solution which appears to resolve the issue (for short period of time) is to power cycle the system.

Issue is happening as I type this. Here are the cron daemon logs for the past several days:
Code:
Feb 19 07:05:00 XXXXXXXXNETP02 /usr/sbin/cron[31104]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:10:00 XXXXXXXXNETP02 /usr/sbin/cron[31106]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:11:00 XXXXXXXXNETP02 /usr/sbin/cron[31108]: (operator) CMD (/usr/libexec/save-entropy)
Feb 19 07:15:00 XXXXXXXXNETP02 /usr/sbin/cron[31120]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:20:00 XXXXXXXXNETP02 /usr/sbin/cron[31124]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:22:00 XXXXXXXXNETP02 /usr/sbin/cron[31126]: (operator) CMD (/usr/libexec/save-entropy)
Feb 19 07:25:00 XXXXXXXXNETP02 /usr/sbin/cron[31138]: (root) CMD (/usr/libexec/atrun)
Feb 22 16:39:59 XXXXXXXXNETP02 crontab[31545]: (root) BEGIN EDIT (root)
Feb 22 16:40:00 XXXXXXXXNETP02 /usr/sbin/cron[31548]: (root) CMD (/usr/libexec/atrun)
Feb 22 16:40:09 XXXXXXXXNETP02 crontab[31545]: (root) END EDIT (root)
Feb 22 16:44:00 XXXXXXXXNETP02 /usr/sbin/cron[31553]: (operator) CMD (/usr/libexec/save-entropy)
Feb 22 16:45:00 XXXXXXXXNETP02 /usr/sbin/cron[31565]: (root) CMD (/usr/libexec/atrun)
Feb 22 16:50:00 XXXXXXXXNETP02 /usr/sbin/cron[31569]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:14:28 XXXXXXXXNETP02 crontab[31725]: (root) LIST (root)
Feb 23 14:20:00 XXXXXXXXNETP02 /usr/sbin/cron[31779]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:22:00 XXXXXXXXNETP02 /usr/sbin/cron[31785]: (operator) CMD (/usr/libexec/save-entropy)
Feb 23 14:25:00 XXXXXXXXNETP02 /usr/sbin/cron[31799]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:30:00 XXXXXXXXNETP02 /usr/sbin/cron[31804]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:33:00 XXXXXXXXNETP02 /usr/sbin/cron[31810]: (operator) CMD (/usr/libexec/save-entropy)
Feb 23 14:35:00 XXXXXXXXNETP02 /usr/sbin/cron[31822]: (root) CMD (/usr/libexec/atrun)
[CMD=]XXXXXXXXNETP02# /etc/rc.d/cron status[/CMD]
Code:
cron is running as pid 31765.
[CMD=]XXXXXXXXNETP02# date[/CMD]
Code:
Thu Feb 23 14:53:09 EST 2012
[CMD=]XXXXXXXXNETP02# /etc/rc.d/cron stop[/CMD]
Code:
Stopping cron.
[CMD=]XXXXXXXXNETP02# /etc/rc.d/cron status[/CMD]
Code:
cron is not running.
[CMD=]XXXXXXXXNETP02# /etc/rc.d/cron start[/CMD]
Code:
Starting cron.
[CMD=]XXXXXXXXNETP02# date[/CMD]
Code:
Thu Feb 23 14:53:27 EST 2012
[CMD=]XXXXXXXXNETP02# tail -20 /var/log/cron[/CMD]
Code:
Feb 19 07:05:00 XXXXXXXXNETP02 /usr/sbin/cron[31104]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:10:00 XXXXXXXXNETP02 /usr/sbin/cron[31106]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:11:00 XXXXXXXXNETP02 /usr/sbin/cron[31108]: (operator) CMD (/usr/libexec/save-entropy)
Feb 19 07:15:00 XXXXXXXXNETP02 /usr/sbin/cron[31120]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:20:00 XXXXXXXXNETP02 /usr/sbin/cron[31124]: (root) CMD (/usr/libexec/atrun)
Feb 19 07:22:00 XXXXXXXXNETP02 /usr/sbin/cron[31126]: (operator) CMD (/usr/libexec/save-entropy)
Feb 19 07:25:00 XXXXXXXXNETP02 /usr/sbin/cron[31138]: (root) CMD (/usr/libexec/atrun)
Feb 22 16:39:59 XXXXXXXXNETP02 crontab[31545]: (root) BEGIN EDIT (root)
Feb 22 16:40:00 XXXXXXXXNETP02 /usr/sbin/cron[31548]: (root) CMD (/usr/libexec/atrun)
Feb 22 16:40:09 XXXXXXXXNETP02 crontab[31545]: (root) END EDIT (root)
Feb 22 16:44:00 XXXXXXXXNETP02 /usr/sbin/cron[31553]: (operator) CMD (/usr/libexec/save-entropy)
Feb 22 16:45:00 XXXXXXXXNETP02 /usr/sbin/cron[31565]: (root) CMD (/usr/libexec/atrun)
Feb 22 16:50:00 XXXXXXXXNETP02 /usr/sbin/cron[31569]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:14:28 XXXXXXXXNETP02 crontab[31725]: (root) LIST (root)
Feb 23 14:20:00 XXXXXXXXNETP02 /usr/sbin/cron[31779]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:22:00 XXXXXXXXNETP02 /usr/sbin/cron[31785]: (operator) CMD (/usr/libexec/save-entropy)
Feb 23 14:25:00 XXXXXXXXNETP02 /usr/sbin/cron[31799]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:30:00 XXXXXXXXNETP02 /usr/sbin/cron[31804]: (root) CMD (/usr/libexec/atrun)
Feb 23 14:33:00 XXXXXXXXNETP02 /usr/sbin/cron[31810]: (operator) CMD (/usr/libexec/save-entropy)
Feb 23 14:35:00 XXXXXXXXNETP02 /usr/sbin/cron[31822]: (root) CMD (/usr/libexec/atrun)
Any assistance would be greatly appreciated. Thanks.
 
Now there's an interesting problem. I recommend starting cron(8) from a shell prompt (perhaps in a sysutils/tmux session?) with debug flags in place.

I'd start with:
Code:
# cron -x ext,load,misc,pars,proc > /tmp/crond-chatter.txt

Next time it crashes, check the output. Hopefully it will shed some light.
 
I will give that a shot and see what comes up. I have multiple other systems of the same config (some of which are under 4 times the traffic volume and workload) but none of them have this issue so I will be very interested in knowing the root cause for this issue. Unfortunately, not sure when she will trip up again but I will be sure to come with the results if possible. Thanks in advance for your help troubleshooting with this.
 
Back
Top