PHP fsockopen causes 100% CPU and GATEWAY TIMEOUT on all web sites

Rob215x · Oct 8, 2020

I thought I'd post here first because I've always seen great responses in this forum and I wanted to see if any of you had some insight.

I don't know if this is:
~~(1) a PHP bug,~~
(2) a vulnerability,
(3) something I haven't configured properly (very possible lol), or
(4) something else.

I am running:

FreeBSD 11.3-RELEASE-p5
PHP 7.2.33 (fpm-fcgi), Zend OPcache v7.2.33
Apache/2.4.46 (FreeBSD)

In the PHP documentation:

PHP: fsockopen - Manual

Open Internet or Unix domain socket connection

www.php.net

The first example given is:

PHP:

<?php
$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);
if (!$fp) {
    echo "$errstr ($errno)<br />\n";
} else {
    $out = "GET / HTTP/1.1\r\n";
    $out .= "Host: www.example.com\r\n";
    $out .= "Connection: Close\r\n\r\n";
    fwrite($fp, $out);
    while (!feof($fp)) {
        echo fgets($fp, 128);
    }
    fclose($fp);
}
?>

I'm building a much more complicated script but this simple example is all I need to illustrate the problem:

With certain VALID URLs, the above script will cause one of the CPU cores to go to 100% for 120 seconds until the PHP script times out. Then, the following error will appear in my PHP error log:

Code:

[06-Oct-2020 23:07:38 America/New_York] PHP Fatal error:  Maximum execution time of 120 seconds exceeded in /... path-to-my-script.php on line 710

During those 2 minutes, none of the sites on my server are accessible! I'm hosting about 20 client sites at the moment. This is a backup server with all the same specs and same software versions as the production server. Its roughly, but not exactly a mirror and all of the sites on the backup are fully functional.

Now, I realize the code example in the PHP docs is simplified but I'm concerned that such a simple script can essentially cripple the Apache server.

Digging into it, I **think** the "lock up" happens here:

PHP:

    while (!feof($fp)) {
        echo fgets($fp, 128);
    }

I've found some servers aren't sending standard CRLF, some servers send mangled headers, some servers don't like the "128" and I've found that simply using "fgets($fp);" will SOMETIMES avoid the lock-up for that particular case.

But I'm not here for advice on coding. I have the function working well. My issue is that anyone playing with this PHP function can lock up the whole server for 2 minutes every they run it.

So I my thoughts/questions are:
1) Why would a socket connection cause a core to go to 100%?
2) I've got 8 cores. Htop is showing only one going to 100%. Why is this locking up all my other web sites for 2 minutes?

Thanks in advance for any ideas. Let me know if additional system info is needed.

ekvz · Oct 8, 2020

1) Likely (as you suspect) because the while part turns into a busy loop. This doesn't necessarily have to be a PHP bug either. Reading small chunks of unknown data especially with fgets is not really that much of a good idea. I am not saying that's the exact thing that is happening in your case but the remote server could easily feed you a stream of 0 byte lines (and iirc fgets doesn't always do what one would expect it to do either) at full speed. PHP (or rather the CPU utilization) won't like that at all. A large stream of 128 byte chunks won't be great either. Even if you just want to read the header i'd rather go for fread with large chunks and combining the data until end of header is found (obviously with checks to abort if the header grows suspiciously large either because the remote side is sending garbage or because the end of header marker was missed for some reason). Either that or just use CURL and avoid all the details.

2) That's a good question. That only 1 core sees the utilization should be no surprise as php is single threaded but obviously your server shouldn't lock up. Narrowing this down remotely isn't easy though. Given there is a gateway timeout i figure apache is fine but PHP is not responding. I am pretty much just randomly guessing here but: How many workers does your fpm pool have? Have you tried disabling the Zend Cache to see if the problem persists?

Rob215x · Oct 8, 2020

ekvz said:
1) Likely (as you suspect) because the while part turns into a busy loop. This doesn't necessarily have to be a PHP bug either. Reading small chunks of unknown data especially with fgets is not really that much of a good idea. I am not saying that's the exact thing that is happening in your case but the remote server could easily feed you a stream of 0 byte lines (and iirc fgets doesn't always do what one would expect it to do either) at full speed. PHP (or rather the CPU utilization) won't like that at all. A large stream of 128 byte chunks won't be great either. Even if you just want to read the header i'd rather go for fread with large chunks and combining the data until end of header is found (obviously with checks to abort if the header grows suspiciously large either because the remote side is sending garbage or because the end of header marker was missed for some reason). Either that or just use CURL and avoid all the details.

Thank you, that all makes sense. With my custom function I'm already using fread, checking for 0 byte lines and other things. I still have quite a few things to refine yet. I started with CURL but it doesn't do everything I need (or I haven't learned how to use all of the many many parameters). But I'm more concerned with this next part...

ekvz said:
2) That's a good question. That only 1 core sees the utilization should be no surprise as php is single threaded but obviously your server shouldn't lock up. Narrowing this down remotely isn't easy though. Given there is a gateway timeout i figure apache is fine but PHP is not responding. I am pretty much just randomly guessing here but: How many workers does your fpm pool have? Have you tried disabling the Zend Cache to see if the problem persists?

I will be doing some tests, after I wake up (its 5am here). I'm also wondering if I need to be using the mpm_event_module instead of the mpm_prefork_module that I currently have Apache configured to use.

I recently upgraded (a few months ago) to use php-fpm and so I may not have it configured properly. I have no pool limits set in php-fpm.conf but I'll be researching that as well. The server has been working fine and I have several clients with pretty busy sites so I would definitely hear about it if something wasn't working. But I want it to work the best it can.

ekvz · Oct 8, 2020

Rob215x said:
Thank you, that all makes sense. With my custom function I'm already using fread, checking for 0 byte lines and other things. I still have quite a few things to refine yet. I started with CURL but it doesn't do everything I need (or I haven't learned how to use all of the many many parameters).

I see.

Rob215x said:
I'm also wondering if I need to be using the mpm_event_module instead of the mpm_prefork_module that I currently have Apache configured to use.

Might be worth trying. While gateway timeout is pointing towards PHP not responding i haven't used apache in ages so i don't really know the specifics of the different server modules it offers.

Rob215x · Oct 9, 2020

Looking in php-fpm.d/www.conf, I'm just using the default settings after installing from Ports (same settings on production server and test server). They are:

Code:

pm = dynamic

pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3

I'm currently reading up on how to set these. On the production server, I have some fairly busy sites... okay wait, I know "fairly busy" is super meaningless- how about...

- 3 domains are getting 10,000 to 15,000 visits/day
- 2 domains are getting 2,000 to 4,000 visits/day
- the rest are getting 200 to 1,000 visits/day

So, not a super busy server compared to some, but still some traffic? I haven't seen any issues with the default settings.

I'll post what I find but if anyone has suggestions, they are greatly appreciated.

richardtoohey2 · Oct 9, 2020

What happens if you try the script outside of Apache and php-fpm and networking?

If you run that script on the machine (or as similar as you can), can you still use the machine or is the machine itself almost unusable? If the latter then your DOS script is a great success, and it's not Apache and not (?) php-fpm etc. But if the machine still seems fairly usable in another terminal or SSH, then you can carry on looking "upwards".

You might want to put a timeout in there, though, in case it does lock-up the machine.

(And 11.3 EOL so someone will mention that!)

Will be interesting to see if changing the Apache modules makes a difference but can't see that it would - I think prefork just means it starts up processes ready to take connections and it doesn't seem (?) a question of there not being enough processes to handle incoming requests.

What is the CPU? Is it 8 "real" cores or are some of those threads?

Rob215x · Oct 10, 2020

richardtoohey2 said:
What happens if you try the script outside of Apache and php-fpm and networking?

You mean run the php script from a terminal? I could make a simple version of it without any of the html

Rob215x · Oct 10, 2020

UPDATE:
I rebooted the server yesterday and had not tried the "problem" script until just a few minutes ago. I made a special version of my script with a known url that causes the failure every time.

Now, my script STILL causes one of the CPUs to go to 100% but I CAN load pages on other web sites. I didn't change anything else. Here's a snippet of what I see in htop after running that script:

Screen Shot 2020-10-10 at 3.49.55 AM.png

Now I'm at a loss because rebooting the server seems to have "fixed" it. But I'm still wondering what's going on and I'm still learning how to configure PHP-FPM.

Earlier today, I figured out how to enable PHP-FPM's own status output that uses an html page that comes with the port. I found some instructions here: https://gist.github.com/Jiab77/a9428050ab9bb3f17c5e33343da94fd8
(His instructions are not exactly right for my freeBSD installation but I just had to change some paths)

When I run the problem script, the PHP-FPM status shows it active for 2 minutes (until PHP finally gives up and writes that "PHP Fatal Error" to the log.

I wonder if www processes are getting "stuck" and over time, they are using up resources until a point where a busy process will keep others from starting??

VladiBG · Oct 10, 2020

PHP: feof - Manual

Tests for end-of-file on a file pointer

www.php.net

ekvz · Oct 10, 2020

Rob215x said:
I wonder if www processes are getting "stuck" and over time, they are using up resources until a point where a busy process will keep others from starting??

It indeed kind of seems like "something" wasn't quite right with the fpm pool and that "something" was fixed when the box got rebooted. I really don't know what that could be but it sounds like the logical explanation. Maybe something like the workers not cleanly terminating up until the point where there is only a single functioning one left (even if it would be somewhat strange why exactly this one would survive while the others didn't...) and once it get's occupied by the busy loop there is nothing left to service requests and the server times out waiting for php to respond. How to narrow this down or even test it is sadly something i don't really have an answer for right now.

Rob215x · Oct 10, 2020

VladiBG said:
PHP: feof - Manual

Tests for end-of-file on a file pointer

www.php.net

First, I want to thank you for pointing this out. I had NOT seen the warnings on this page but they describe EXACTLY the problems I was having with earlier versions of my script, that are now fixed in the latest version. I believe EXAMPLE #2 is what was happening, resulting in an infinite loop. I'm not sure why I didn't see this page, as I studied all of the documentation and comments on the related pages for fsockopen(), fgets(), and fread().

But now, I'm trying to figure out why my server wouldn't load any other web sites for 2 minutes, when PHP finally gives up on the infinite loop and kills the process. But, when I rebooted the server, other pages DO load, while the feof() infinite loop is happening. I'm still going through the PHP-FPM documentation.

Thanks again!

Rob215x · Oct 10, 2020

ekvz said:
Maybe something like the workers not cleanly terminating up until the point where there is only a single functioning one left...

This is what I'm thinking as well, and as VladiBG pointed out, early versions of my script were probably causing an infinite loop for 2 minutes, until PHP finally gives up. I'm wondering if those kinds of "PHP Fatal error" situations are not getting terminated properly?

Also, during my Google exploration of PHP-FPM configuration tips, I'm seeing people say you can assign different pools to different sites. If this is true, it would be great because it would let me separate the high traffic sites and add another layer to keep one site from bringing the rest down.