I am running into an issue where our php-fpm goes into a weird lock I believe and all of a sudden it stops accepting new connections but the server's load goes through the roof. I also run an nfs daemon on this machine.
Apache logs show this:
There is no error in the php-fpm log itself.
Php-Fpm is configured in dynamic mode with up to 3000 children:
Around the time of this event there were however only ~ 1800 children spawned
I tried looking at queues but it seems normal:
What else could I be checking that might cause this ? Even if it were nfsd related, why would that affect php-fpm ?
Code:
last pid: 69279; load averages: 39.06, 13.79, 6.86 up 216+23:55:05 00:01:05
1838 processes:21 running, 1814 sleeping, 3 zombie
CPU: 16.3% user, 0.0% nice, 62.6% system, 0.2% interrupt, 20.8% idle
Mem: 6167M Active, 3624M Inact, 47M Laundry, 3365M Wired, 1108M Buf, 2911M Free
Swap: 4096M Total, 46M Used, 4050M Free, 1% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
78685 root 12 22 0 11M 1636K rpcsvc 1 32.2H 37.54% nfsd
67556 www 1 -16 0 544M 21M CPU5 5 0:01 17.74% php-fpm
68348 www 1 4 0 689M 68M RUN 1 0:01 17.32% php-fpm
68643 www 1 24 0 688M 44M RUN 5 0:00 14.41% php-fpm
68105 www 1 -16 0 544M 17M CPU2 2 0:00 13.09% php-fpm
67564 www 1 23 0 544M 26M RUN 6 0:00 12.53% php-fpm
68650 www 1 21 0 705M 45M biowr 6 0:00 10.97% php-fpm
Apache logs show this:
Code:
[Sun Apr 09 22:00:10.479230 2023] [proxy:error] [pid 54028:tid 34392547840] (54)Connection reset by peer: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed
[Sun Apr 09 22:00:10.501709 2023] [proxy:error] [pid 54028:tid 34392547840] (54)Connection reset by peer: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed
There is no error in the php-fpm log itself.
Php-Fpm is configured in dynamic mode with up to 3000 children:
Code:
pm = dynamic
pm.max_children = 3000
Around the time of this event there were however only ~ 1800 children spawned
Code:
ps auxww | grep fpm | wc -l
1787
I tried looking at queues but it seems normal:
Code:
netstat -aL
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address
tcp6 0/0/128 *.nfsd
tcp4 0/0/128 *.nfsd
tcp4 0/0/128 *.http
tcp6 0/0/128 *.http
tcp4 0/0/128 localhost.9000
tcp4 0/0/128 *.smux
tcp4 0/0/128 *.755
tcp6 0/0/128 *.755
tcp4 0/0/128 *.700
tcp6 0/0/128 *.976
tcp4 0/0/128 *.779
tcp6 0/0/128 *.779
tcp4 0/0/128 *.sunrpc
tcp6 0/0/128 *.sunrpc
unix 0/0/5 /var/agentx/master
unix 0/0/128 /var/run/rpcbind.sock
unix 0/0/4 /var/run/devd.pipe
unix 0/0/4 /var/run/devd.seqpacket.pipe
What else could I be checking that might cause this ? Even if it were nfsd related, why would that affect php-fpm ?