php-fpm capacity issue

I am running into an issue where our php-fpm goes into a weird lock I believe and all of a sudden it stops accepting new connections but the server's load goes through the roof. I also run an nfs daemon on this machine.

Code:
last pid: 69279;  load averages: 39.06, 13.79,  6.86                                                          up 216+23:55:05 00:01:05
1838 processes:21 running, 1814 sleeping, 3 zombie
CPU: 16.3% user,  0.0% nice, 62.6% system,  0.2% interrupt, 20.8% idle
Mem: 6167M Active, 3624M Inact, 47M Laundry, 3365M Wired, 1108M Buf, 2911M Free
Swap: 4096M Total, 46M Used, 4050M Free, 1% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
78685 root         12  22    0    11M  1636K rpcsvc   1  32.2H  37.54% nfsd
67556 www           1 -16    0   544M    21M CPU5     5   0:01  17.74% php-fpm
68348 www           1   4    0   689M    68M RUN      1   0:01  17.32% php-fpm
68643 www           1  24    0   688M    44M RUN      5   0:00  14.41% php-fpm
68105 www           1 -16    0   544M    17M CPU2     2   0:00  13.09% php-fpm
67564 www           1  23    0   544M    26M RUN      6   0:00  12.53% php-fpm
68650 www           1  21    0   705M    45M biowr    6   0:00  10.97% php-fpm

Apache logs show this:

Code:
[Sun Apr 09 22:00:10.479230 2023] [proxy:error] [pid 54028:tid 34392547840] (54)Connection reset by peer: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed
[Sun Apr 09 22:00:10.501709 2023] [proxy:error] [pid 54028:tid 34392547840] (54)Connection reset by peer: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed

There is no error in the php-fpm log itself.

Php-Fpm is configured in dynamic mode with up to 3000 children:

Code:
pm = dynamic
pm.max_children = 3000

Around the time of this event there were however only ~ 1800 children spawned

Code:
ps auxww | grep fpm | wc -l
    1787

I tried looking at queues but it seems normal:

Code:
netstat -aL
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen                           Local Address
tcp6  0/0/128                          *.nfsd
tcp4  0/0/128                          *.nfsd
tcp4  0/0/128                          *.http
tcp6  0/0/128                          *.http
tcp4  0/0/128                          localhost.9000
tcp4  0/0/128                          *.smux
tcp4  0/0/128                          *.755
tcp6  0/0/128                          *.755
tcp4  0/0/128                          *.700
tcp6  0/0/128                          *.976
tcp4  0/0/128                          *.779
tcp6  0/0/128                          *.779
tcp4  0/0/128                          *.sunrpc
tcp6  0/0/128                          *.sunrpc
unix  0/0/5                            /var/agentx/master
unix  0/0/128                          /var/run/rpcbind.sock
unix  0/0/4                            /var/run/devd.pipe
unix  0/0/4                            /var/run/devd.seqpacket.pipe

What else could I be checking that might cause this ? Even if it were nfsd related, why would that affect php-fpm ?
 
Back
Top