Unkillable processes

Ever since upgrading our servers from 10.1 to 10.2 some servers have exhibited the same problem. The daily security run process of scanning files for privs etc just hangs and does not advance, nor can it be killed.

Code:
root  93165  0.0  0.0  16612  1364  -  I  3:01AM  0:00.00 | `-- cron: running job (cron)
root  93170  0.0  0.0  17088  1364  -  Is  3:01AM  0:00.01 |  `-- /bin/sh - /usr/sbin/periodic daily
root  93185  0.0  0.0  8252  1064  -  I  3:01AM  0:00.00 |  `-- lockf -t 0 /var/run/periodic.daily.lock /bin/sh /usr/sbin/periodic LOCKED daily
root  93186  0.0  0.0  17088  1364  -  I  3:01AM  0:00.00 |  `-- /bin/sh /usr/sbin/periodic LOCKED daily
root  93196  0.0  0.0  17088  1364  -  I  3:01AM  0:00.01 |  |-- /bin/sh /usr/sbin/periodic LOCKED daily
root  93289  0.0  0.0  17088  1364  -  I  3:01AM  0:00.00 |  | `-- /bin/sh /etc/periodic/daily/450.status-security
root  93290  0.0  0.0  17088  1364  -  I  3:01AM  0:00.00 |  |  `-- /bin/sh - /usr/sbin/periodic security
root  93292  0.0  0.0  8252  1064  -  I  3:01AM  0:00.00 |  |  `-- lockf -t 0 /var/run/periodic.security.lock /bin/sh /usr/sbin/periodic LOCKED security
root  93293  0.0  0.0  17088  1412  -  I  3:01AM  0:00.00 |  |  `-- /bin/sh /usr/sbin/periodic LOCKED security
root  93300  0.0  0.0  17088  1500  -  I  3:01AM  0:00.00 |  |  |-- /bin/sh /usr/sbin/periodic LOCKED security
root  22903  0.0  0.0  17088  1792  -  I  5:39AM  0:00.00 |  |  | `-- /bin/sh - /etc/periodic/security/110.neggrpperm
root  22907  0.0  0.0  17088  1792  -  I  5:39AM  0:00.00 |  |  |  `-- /bin/sh - /etc/periodic/security/110.neggrpperm
root  22908  0.0  0.0  24680  2972  -  D  5:39AM  0:57.19 |  |  |  |-- find -sx / /data /dev/null -type f ( ( ! -perm +010 -and -perm +001 ) -or ( ! -perm +020 -and -perm +002 ) -or ( ! -perm +040 -and -perm +
root  22909  0.0  0.0  12348  1344  -  I  5:39AM  0:00.00 |  |  |  |-- tee /dev/stderr
root  22910  0.0  0.0  8260  1336  -  I  5:39AM  0:00.00 |  |  |  `-- wc -l
root  93301  0.0  0.0  12444  1208  -  I  3:01AM  0:00.00 |  |  `-- mail -E -s <servername> daily security run output root
root  93197  0.0  0.0  12444  1064  -  I  3:01AM  0:00.00 |  `-- mail -E -s <servername> daily run output root

The only thing that helps is a reboot, but doing sync ; sync ; sync ; reboot still often leads to a lengthly fsck.

Today the process in the example has been running for 5 hours when it normally completes just after 2. find -sx / /data /dev/null -type f... cannot be killed. It does not react to kill or kill -9.

Any thoughts?
 
The D indicates find is in disk wait, it won't notice the signal before the disk responds. procstat -af might show the pathname which gives trouble.

Juha
 
Back
Top