PHP-FPM unresponsive debugging

I have a bit of a weird situation with php-fpm. All of a sudden for the past few nights at around 3AM a particular web server that runs apache + php-fpm starts behaving oddly.

The server load goes up to 500/600 and I can't figure out why. Web service stops and in the php-fpm logs I see this:
Code:
[Wed Nov 30 03:02:50.264359 2022] [proxy:error] [pid 82157:tid 34395073792] (54)Connection reset by peer: AH00957: FCGI: attempt to con
nect to 127.0.0.1:9000 (127.0.0.1:9000) failed

In the php-fpm logs I see that it stops logging and the logs just jump in time:
Code:
[30-Nov-2022 03:02:27] NOTICE: [pool www] child 81068 exited with code 0 after 1118.758260 seconds from start
[30-Nov-2022 03:02:27] NOTICE: [pool www] child 82080 started
[30-Nov-2022 03:13:55] WARNING: [pool www] child 82053 exited on signal 9 (SIGKILL) after 746.754336 seconds from start
[30-Nov-2022 03:13:55] NOTICE: [pool www] child 82226 started
/var/log/messages just show connections piling up
Code:
Nov 30 03:02:50 gomn-2 kernel: sonewconn: pcb 0xfffff8002cf8b7c0 (127.0.0.1:9000 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (1 occurrences)
Nov 30 03:03:50 gomn-2 kernel: sonewconn: pcb 0xfffff8002cf8b7c0 (127.0.0.1:9000 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (454 occurrences)
Nov 30 03:04:50 gomn-2 kernel: sonewconn: pcb 0xfffff8002cf8b7c0 (127.0.0.1:9000 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (426 occurrences)
If I manually kill the php-fpm processes things are eventually fine again, but even doing killall -9 php-fpm takes a long time.

I'm on 13.1-RELEASE FreeBSD 13.1-RELEASE and php74-7.4.29

I also tried doing a truss -p on one of the php-fpm processes but it didn't print anything, maybe because of the high load (?)

How can I debug this further ? It will happen again for sure tonight.
 
PHP 7.4 is EOL but don't think that 8.x will necessarily fix any issue.

Bump up and check access/error logs to see what the traffic is.

You can also write a cron jon that runs every minute and dumps vital stats from top, netstat, etc. and see if any clues in that.

Any cron jobs at 3 a.m.?

Any changes to the server or what's installed on it?
 
I will stay up tonight to see live what's happening.

I ran netstat -m and it seemed fine, but I'll try netstat -a too

There's no cronjobs or other changes.
 
PHP 7.4 is EOL but don't think that 8.x will necessarily fix any issue.

Bump up and check access/error logs to see what the traffic is.

You can also write a cron jon that runs every minute and dumps vital stats from top, netstat, etc. and see if any clues in that.

Any cron jobs at 3 a.m.?

Any changes to the server or what's installed on it?

I do have a new script that I wrote which is supposed to kill hung php-fpm processes but I don't understand why it runs. Basically it connects to php-fpm and gets the status of all threads and kills those that have been running for longer than X seconds.

so I see this entry in the apache access logs
Code:
127.0.0.1 - - [30/Nov/2022:03:00:01 -0500] "GET /polstatus?full&json HTTP/1.1" 200 285696
but I can't understand why it runs at 03:00 when my cron entry is:
Code:
*/7     *       *       *       *       /usr/local/bin/perl /usr/local/gtech/kill_hung.pl
It should only run at 7,14,21,28,35,42,49,56 , right ?
 
At 3AM periodic(8) usually starts running. What might be happening is that the periodic(8) scripts tend to use a lot of I/O. This may have a detrimental effect on your PHP scripts if they also rely on a lot of I/O. They will start stalling and not handling connections fast enough, which in turn causes the listen queue to overflow.

Code:
# Perform daily/weekly/monthly maintenance.
1       3       *       *       *       root    periodic daily
15      4       *       *       6       root    periodic weekly
30      5       1       *       *       root    periodic monthly

Try shifting these an hour and see if the timing of your PHP issues also shift along with it. If you have a lot of jails (or VMs) running on the same machine, spread the time around a bit, so they don't ALL start running periodic(8) at the exact same time.

Also double check the webserver's access and error logs. You may be getting hammered by a bot or some other malware that just happens to hit your site at around the same time.
 
I've not used php-fpm in production but the times I've had Apache getting swamped the load has built up steadily before the actual meltdown.

So it may not actually be something at 3 a.m. itself - might be a steady build up of traffic in the hours beforehand - that makes it fall over at around 3.

You can also make changes to work around the "Listen queue overflow" sonewconn part e.g. make the queue bigger with this in /etc/sysctl.conf

kern.ipc.soacceptqueue=512

That's just the OS change so you have to tweak web server settings, too. e.g. ListenBackLog 512 in httpd-mpm.conf (or similar; depends on your web server settings).

But this may just hide an issue you need to fix, so not suggesting you rush in and do this.
 
The security scripts in periodic can do alot of IO.
I disabled them
/etc/periodic.conf
Code:
daily_queuerun_enable="NO"                # Run mail queue
daily_status_include_submit_mailq="NO"            # Also submit queue
daily_status_security_enable="NO"            # Security check
daily_submit_queuerun="NO"                # Also submit queue
monthly_status_security_enable="NO"            # Security check
security_status_logincheck_enable="NO"
weekly_status_security_enable="NO"            # Security check
 
At 3AM periodic(8) usually starts running. What might be happening is that the periodic(8) scripts tend to use a lot of I/O. This may have a detrimental effect on your PHP scripts if they also rely on a lot of I/O. They will start stalling and not handling connections fast enough, which in turn causes the listen queue to overflow.

Code:
# Perform daily/weekly/monthly maintenance.
1       3       *       *       *       root    periodic daily
15      4       *       *       6       root    periodic weekly
30      5       1       *       *       root    periodic monthly

Try shifting these an hour and see if the timing of your PHP issues also shift along with it. If you have a lot of jails (or VMs) running on the same machine, spread the time around a bit, so they don't ALL start running periodic(8) at the exact same time.

Also double check the webserver's access and error logs. You may be getting hammered by a bot or some other malware that just happens to hit your site at around the same time.

sweet, I moved them during the day 8,9,10 AM since this box is mostly active at night. Let's see if it changes anything.
Would you also happen to know why a cron entry at */7 runs at 03:00 ?
 
Back
Top