Process hangs in RUN state and can not be killed or debugged

Faced a very strange behavior:

A process (tesseract) hangs and even can not be killed.

kill -QUIT executes but does not terminate process or create dump. Usual kill does nothing.

gdb hangs on "attach PID" infinitely

System is "FreeBSD 13.1-RELEASE-p3 GENERIC amd64"

What can be so wrong?
 
Try kill -9 <pid>.

Code:
     9       KILL (non-catchable, non-ignorable kill)
 
This smells like process is in uninterruptible sleep. At least if kill -9 doesn't work. Check with ps, look for the D state.
If this is the case maybe procstat kstack <pid> could give you an idea where is the process stuck and what is it waiting for.
 
As a side note, don't use kill -9 lightly. If the process uses some kind of caching mechanism for example that cache won't get flushed to disk, and you may end up losing data. Use this as a last resort as it will try to forcefully evict the process.
 
> kill -9 <pid>
Did'n work - left process hanging

Code:
# procstat kstack 864
  PID    TID COMM                TDNAME              KSTACK
  864 100155 tesseract           -                   mi_switch ast fast_syscall_common

Code:
# procstat kstack -v 864
  PID    TID COMM                TDNAME              KSTACK
  864 100155 tesseract           -                   mi_switch+0xc2 ast+0x1e6 fast_syscall_common+0x1a5

Looks like hanging somewhere in syscall... but why and how to avoid it?
 
I don't know what "tesseract" is, quick port search showed it's some OCR soft. Most likely processes in uninterruptible state are waiting for I/O. Any chance you are use NFS/CIFS that gets hung? Are there any other processes in "D" state (ps aux)?
Also are you using any devices that scan for QR codes, etc.? Like rs232 attached devices, etc.
 
Tesseract is expected to read file from system partition, process it, and print result to console. No slow/unreliable IO is expected.

There is no processes in "D" state. ps aux indicates the tesseract is hanging in "R+" state
 
You may want to do a check up on your drive health and connections/cables. What does dmesg say? Are there some (maybe lots) of lines pointing to read errors from the disc? Or do you, by chance, read from a network connected drive? Hanging while reading usually means the drive is not responding in a meaningful way.
 
Didn't find any disk errors in dmesg. Just in case did fsck -y in single user mode.
Target file is accessible - can be read, copied without any problems
 
the tesseract is hanging in "R+" state
Well, that's not hanging than (but I know what you mean). R+ means it's running and it's foreground process in its group (it can input something on terminal).
Is it a physical box or a VM? While I probably know the answer can you try sending signal -11 too ?(kill -11 pid).

Also can you share the procstat signal <pid of tesseract>?
 
Well, "running" without any CPU usage and any results :)
Ctrl+C and any other input is completely ignored

Running inside vm used by https://www.hetzner.com/cloud
(dmesg indicates
Code:
CPU: AMD EPYC Processor (2445.48-MHz K8-class CPU
...
Hypervisor: Origin = "KVMKVMKVM"
...
ACPI APIC Table: <BOCHS  BXPCAPIC>
...
da0: <QEMU QEMU HARDDISK 2.5+> Fixed Direct Access SPC-3 SCSI device


Procstat is as follows:

Code:
# procstat signal 868
  PID COMM             SIG     FLAGS
  868 tesseract        HUP      P--
  868 tesseract        INT      ---
  868 tesseract        QUIT     ---
  868 tesseract        ILL      ---
  868 tesseract        TRAP     ---
  868 tesseract        ABRT     ---
  868 tesseract        EMT      ---
  868 tesseract        FPE      ---
  868 tesseract        KILL     ---
  868 tesseract        BUS      ---
  868 tesseract        SEGV     ---
  868 tesseract        SYS      ---
  868 tesseract        PIPE     ---
  868 tesseract        ALRM     ---
  868 tesseract        TERM     P--
  868 tesseract        URG      -I-
  868 tesseract        STOP     ---
  868 tesseract        TSTP     ---
  868 tesseract        CONT     ---
  868 tesseract        CHLD     -I-
  868 tesseract        TTIN     ---
  868 tesseract        TTOU     ---
  868 tesseract        IO       -I-
  868 tesseract        XCPU     ---
  868 tesseract        XFSZ     ---
  868 tesseract        VTALRM   ---
  868 tesseract        PROF     ---
  868 tesseract        WINCH    -I-
  868 tesseract        INFO     -I-
  868 tesseract        USR1     ---
  868 tesseract        USR2     ---
  868 tesseract        32       --C
  868 tesseract        33       ---
  868 tesseract        34       ---
  868 tesseract        35       ---
  868 tesseract        36       ---
  868 tesseract        37       ---
  868 tesseract        38       ---
  868 tesseract        39       ---
  868 tesseract        40       ---
  868 tesseract        41       ---
  868 tesseract        42       ---
  868 tesseract        43       ---
  868 tesseract        44       ---
  868 tesseract        45       ---
  868 tesseract        46       ---
  868 tesseract        47       ---
  868 tesseract        48       ---
  868 tesseract        49       ---
  868 tesseract        50       ---
  868 tesseract        51       ---
  868 tesseract        52       ---
  868 tesseract        53       ---
  868 tesseract        54       ---
  868 tesseract        55       ---
  868 tesseract        56       ---
  868 tesseract        57       ---
  868 tesseract        58       ---
  868 tesseract        59       ---
  868 tesseract        60       ---
  868 tesseract        61       ---
  868 tesseract        62       ---
  868 tesseract        63       ---
  868 tesseract        64       ---
  868 tesseract        65       ---
  868 tesseract        66       ---
  868 tesseract        67       ---
  868 tesseract        68       ---
  868 tesseract        69       ---
  868 tesseract        70       ---
  868 tesseract        71       ---
  868 tesseract        72       ---
  868 tesseract        73       ---
  868 tesseract        74       ---
  868 tesseract        75       ---
  868 tesseract        76       ---
  868 tesseract        77       ---
  868 tesseract        78       ---
  868 tesseract        79       ---
  868 tesseract        80       ---
  868 tesseract        81       ---
  868 tesseract        82       ---
  868 tesseract        83       ---
  868 tesseract        84       ---
  868 tesseract        85       ---
  868 tesseract        86       ---
  868 tesseract        87       ---
  868 tesseract        88       ---
  868 tesseract        89       ---
  868 tesseract        90       ---
  868 tesseract        91       ---
  868 tesseract        92       ---
  868 tesseract        93       ---
  868 tesseract        94       ---
  868 tesseract        95       ---
  868 tesseract        96       ---
  868 tesseract        97       ---
  868 tesseract        98       ---
  868 tesseract        99       ---
  868 tesseract        100      ---
  868 tesseract        101      ---
  868 tesseract        102      ---
  868 tesseract        103      ---
  868 tesseract        104      ---
  868 tesseract        105      ---
  868 tesseract        106      ---
  868 tesseract        107      ---
  868 tesseract        108      ---
  868 tesseract        109      ---
  868 tesseract        110      ---
  868 tesseract        111      ---
  868 tesseract        112      ---
  868 tesseract        113      ---
  868 tesseract        114      ---
  868 tesseract        115      ---
  868 tesseract        116      ---
  868 tesseract        117      ---
  868 tesseract        118      ---
  868 tesseract        119      ---
  868 tesseract        120      ---
  868 tesseract        121      ---
  868 tesseract        122      ---
  868 tesseract        123      ---
  868 tesseract        124      ---
  868 tesseract        125      ---
  868 tesseract        126      ---
  868 tesseract        127      ---
  868 tesseract        128      ---
 
With the behavior you mentioned it's weird it's not in D state. Actually I'm not aware of any other state it can be and not have SIGKILL delivered (can't be ignored). Did you try to send it SEGV too as I mentioned above? kill -11 <pid> (probably useless but just to try).
From the signal list you pasted I see you have few in pending (HUP/TERM). This was before you try to send it KILL (-9) ?
The only other option that comes to my mind is to truss it to see what it does: truss -f -o tesseractlog.out <cmd_you_use_to_exec_tesseract>

Check if that log doesn't have something you are not comfortable sharing (IP addresses, etc.) and please share it here.
 
It looks like it uses OpenMP, Archer (a debugging tool for OpenMP applications). It then tries to use MemKind, which is some tool for NUMA++ machines (don't know the details). That would go along with all the cpuset_setaffinity calls. Could it be that it is a highly parallel application that has simply deadlocked? But that would not explain why it doesn't react to signals, and refuses attaching a debugger. Maybe it has exposed some bug (kernel or userspace basic libraries) that only happen when heavily parallel programs?
 
Can you please answer those questions I asked too? I'd like to know if you see KILL in pending in procstat signal once you attempted to kill it with -9 (or -11).
Trace doesn't show anything that would help much. It does block set of signals but that's nothing unusual. Kill can't be blocked anyway.

But again this would make sense if process is in D state, not in R. Now looking at the kstack process it's actually attempting to do a syscall, doesn't quite get there. At this point system crash would be helpful to see more what's going on.

For us to try -- did you do something special when setting tesseract ? Does it freeze on certain image only ?
 
Looks very strange but now tesseract started working.

This happened before any software or OS updates.
So I can only suppose there was update of provider's virtualization software or something like that
 
Back
Top