Cleaning Up an Unkillable Process - InfluxDB

Hello,

Recently I have been facing an issue when running InfluxDB v0.9.0-rc31 (Go Application) on 10.1-STABLE (r283556).
Once InfluxDB is started and working correctly after some time the process stops responding and it can no longer be restarted nor killed -9 (I have also tried using gdb(1) and truss(1) without any success)

I have been searching around and have not been able to find any information on how to kill the process without rebooting the server.

Does anyone know if there is any way of killing the process?


Code:
# ps auwwxp 46035
USER  PID %CPU %MEM  VSZ  RSS TT  STAT STARTED  TIME COMMAND
root 46035  0.0  0.1 88640 8856  3- TN+  3:01PM 1:07.57 /usr/local/bin/influxd -config=/usr/local/etc/influxd.conf

# procstat -kk 46035
  PID  TID COMM  TDNAME  KSTACK
46035 101373 influxd  -  mi_switch+0xe1 sleepq_timedwait_sig+0x8b _sleep+0x238 umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
46035 101386 influxd  -  mi_switch+0xe1 thread_suspend_switch+0x170 thread_single+0x4e5 exit1+0xbe sys_sys_exit+0xe ia32_syscall+0x2f8 Xint0x80_syscall+0x95
root@eud3-pr-mutgra1:/ #

# procstat -t 46035
  PID  TID COMM  TDNAME  CPU  PRI STATE  WCHAN
46035 101373 influxd  -  7  64 stop  -
46035 101386 influxd  -  7  64 stop  -


# procstat -i 46035| grep -vE -- '---$'
  PID COMM  SIG  FLAGS
46035 influxd  HUP  P-C
46035 influxd  INT  P-C
46035 influxd  QUIT  P-C
46035 influxd  ILL  --C
46035 influxd  TRAP  --C
46035 influxd  ABRT  --C
46035 influxd  EMT  --C
46035 influxd  FPE  --C
46035 influxd  KILL  P--
46035 influxd  BUS  --C
46035 influxd  SEGV  --C
46035 influxd  SYS  --C
46035 influxd  PIPE  --C
46035 influxd  ALRM  --C
46035 influxd  TERM  P-C
46035 influxd  URG  --C
46035 influxd  STOP  P--
46035 influxd  CHLD  --C
46035 influxd  IO  --C
46035 influxd  XCPU  --C
46035 influxd  XFSZ  --C
46035 influxd  VTALRM  --C
46035 influxd  PROF  --C
46035 influxd  WINCH  P-C
46035 influxd  INFO  --C
46035 influxd  USR1  --C
46035 influxd  USR2  --C
46035 influxd  32  --C


# procstat -j 46035| grep -vE -- '--$'
  PID  TID COMM  SIG  FLAGS

# procstat -r 46035
  PID COMM  RESOURCE  VALUE
46035 influxd  user time  00:00:27.873138
46035 influxd  system time  00:00:39.698525
46035 influxd  maximum RSS  19128 KB
46035 influxd  integral shared memory  36370152 KB
46035 influxd  integral unshared data  2295552 KB
46035 influxd  integral unshared stack  1147776 KB
46035 influxd  page reclaims  3043
46035 influxd  page faults  88
46035 influxd  swaps  0
46035 influxd  block reads  115154
46035 influxd  block writes  41648
46035 influxd  messages sent  30994
46035 influxd  messages received  37976
46035 influxd  signals received  6
46035 influxd  voluntary context switches  2113406
46035 influxd  involuntary context switches  34322
 
The process seems to be sleeping. Killing a process while in kernel space is a big no-no, and the termination will be done when it returns from there. But it is sleeping on some condition which is not being triggered. The call stacks seem to indicate that there is a deadlock. That would be a bug in the program, the callstacks from user space would be telling more. You might attach the debugger in this situation and check if they deadlock here.
 
Thanks for the comment, Here's another example:


Code:
# ps auwwxp 68836
USER  PID %CPU %MEM  VSZ  RSS TT  STAT STARTED  TIME COMMAND
root 68836  2.7  4.2 3839500 348180  3  I+  Fri02PM 125:17.81 ./influxd -config=./influxd.conf

# procstat -kk 68836
  PID  TID COMM  TDNAME  KSTACK   
68836 100401 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_timedwait_sig+0x10 _sleep+0x238 umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100403 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100412 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100428 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100499 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100514 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100515 influxd  -  mi_switch+0xe1 sleepq_timedwait_sig+0x8b _sleep+0x238 umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100516 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100530 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100690 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent+0x401 sys_kevent+0x12a ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100696 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100797 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 100935 influxd  -  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 ia32_syscall+0x2f8 Xint0x80_syscall+0x95
68836 101773 influxd  -  mi_switch+0xe1 thread_suspend_switch+0x170 ptracestop+0x11b cursig+0x3a6 ast+0x42f doreti_ast+0x1f

# procstat -t 68836
  PID  TID COMM  TDNAME  CPU  PRI STATE  WCHAN   
68836 100401 influxd  -  4  120 stop  -   
68836 100403 influxd  -  4  120 stop  uwait   
68836 100412 influxd  -  6  120 stop  uwait   
68836 100428 influxd  -  2  120 stop  uwait   
68836 100499 influxd  -  7  121 stop  uwait   
68836 100514 influxd  -  2  120 stop  uwait   
68836 100515 influxd  -  4  120 stop  -   
68836 100516 influxd  -  0  121 stop  uwait   
68836 100530 influxd  -  5  120 stop  uwait   
68836 100690 influxd  -  6  120 stop  kqread   
68836 100696 influxd  -  4  121 stop  uwait   
68836 100797 influxd  -  1  124 stop  uwait   
68836 100935 influxd  -  0  120 stop  uwait   
68836 101773 influxd  -  7  120 stop  -   

# procstat -i 68836| grep -vE -- '---$'
  PID COMM  SIG  FLAGS
68836 influxd  HUP  --C
68836 influxd  INT  --C
68836 influxd  QUIT  --C
68836 influxd  ILL  --C
68836 influxd  TRAP  --C
68836 influxd  ABRT  --C
68836 influxd  EMT  --C
68836 influxd  FPE  --C
68836 influxd  BUS  --C
68836 influxd  SEGV  --C
68836 influxd  SYS  --C
68836 influxd  PIPE  --C
68836 influxd  ALRM  --C
68836 influxd  TERM  --C
68836 influxd  URG  --C
68836 influxd  CHLD  --C
68836 influxd  IO  --C
68836 influxd  XCPU  --C
68836 influxd  XFSZ  --C
68836 influxd  VTALRM  --C
68836 influxd  PROF  --C
68836 influxd  WINCH  --C
68836 influxd  INFO  --C
68836 influxd  USR1  --C
68836 influxd  USR2  --C
68836 influxd  32  --C

# procstat -j 68836  | grep -vE -- '--$'   
  PID  TID COMM  SIG  FLAGS

# procstat -r 68836
  PID COMM  RESOURCE  VALUE   
68836 influxd  user time  01:14:26.593797   
68836 influxd  system time  00:50:51.362590   
68836 influxd  maximum RSS  1368556 KB
68836 influxd  integral shared memory  3778386144 KB
68836 influxd  integral unshared data  238008576 KB
68836 influxd  integral unshared stack  119004288 KB
68836 influxd  page reclaims  277955   
68836 influxd  page faults  76186   
68836 influxd  swaps  0   
68836 influxd  block reads  7798866   
68836 influxd  block writes  2151601   
68836 influxd  messages sent  1747161   
68836 influxd  messages received  1608958   
68836 influxd  signals received  2   
68836 influxd  voluntary context switches  166066933   
68836 influxd  involuntary context switches  2970674
 
Back
Top