Apache MaxRequestWorkers vs netstat

Mage · May 24, 2018

My sites became slow today behind CloudFlare. The response time was between 30 and 60 s in the browser, or it was a timeout.
The application responded fast, the load was between 1 and 2.5 on the servers, CPU usage about 30%.

I thought the problem was with the connections. Restarting www/apache24 helped temporarily. I found nothing in the logs except for a few friendly

sonewconn: pcb 0xfffff8017833c740: Listen queue overflow: 767 already in queue awaiting acceptance (1026 occurrences)

dmesg lines.

However, netstat -an | grep '.443' | wc showed between 7,500 and 9,000 connections. The vast majority was established. I don’t know much of the networking, and I don’t get it.

The KeepAliveTimeout was 600.

The MPM config was:

<IfModule mpm_event_module>
StartServers 3
ServerLimit 50
MinSpareThreads 50
MaxSpareThreads 500
ThreadsPerChild 50
MaxRequestWorkers 2500
MaxConnectionsPerChild 0
</IfModule>

This config worked for more than a year. I don’t get how the server could have 7,500 https connections. However, I’ve had never checked it until today.

Then I changed the values. It seems it helped. The new config is KeepAliveTimeout 300 and:

<IfModule mpm_event_module>
StartServers 3
ServerLimit 200
MinSpareThreads 50
MaxSpareThreads 500
ThreadsPerChild 64
MaxRequestWorkers 12160
MaxConnectionsPerChild 0
</IfModule>

Considering this config, how can I have above 5,000 connections with 30 processes?

# netstat -an | grep '.443' | wc && netstat -an | grep '.443' | grep ESTABLISHED | wc && ps auwx | grep httpd | wc
5449 32694 425008
5445 32670 424710
31 342 2872

Maybe I have a misconception about what ESTABLISHED means or how www/apache24 works. Although it seems that the config change helped, I think I’m looking at the wrong numbers as 31 processes should not have 5445 connections with 64 threads per child unless I got the whole concept wrong.

SirDice · May 25, 2018

Code:

sonewconn: pcb 0xfffff8017833c740: Listen queue overflow: 767 already in queue awaiting acceptance (1026 occurrences)

This typically means that you're getting more connections than the application can handle and it starts putting them in a queue. These are connections that are being initiated, so technically they're not connected yet and thus won't show up as ESTABLISHED.

In general you can get these when the web application is responding too slow to handle the connection rate. Instead of trying to handle everything on one server you should look into load-balancing the site onto 2 or more web servers. Which is good practice any way as it allows you to take a server offline (for updates for example) without interrupting the service.

max21 · May 28, 2018

Mage said:
.....

This config worked for more than a year. I don’t get how the server could have 7,500 https connections. However, I’ve had never checked it until today.

Then I changed the values. It seems it helped. The new config is KeepAliveTimeout 300 and:

Considering this config, how can I have above 5,000 connections with 30 processes?

Maybe I have a misconception about what ESTABLISHED means or how www/apache24 works. Although it seems that the config change helped, I think I’m looking at the wrong numbers as 31 processes should not have 5445 connections with 64 threads per child unless I got the whole concept wrong.

Just thought you guys would like to see this if not already. I like to see at least 30% of it at work. I would expect at least 10,000 + connections per server if I was there.

https://medium.freecodecamp.org/how...0-000-concurrent-ssl-connections-d017e61a4d27

SirDice · May 28, 2018

I also highly recommend net/haproxy for load-balancing web servers. It works like a dream and is able to handle an insane amount of concurrent connections.

Mage · May 28, 2018

SirDice said:
I also highly recommend net/haproxy for load-balancing web servers. It works like a dream and is able to handle an insane amount of concurrent connections.

net/haproxy is unstable, caused me downtime when I did nothing on the server. It took ages to restart at times. It was the worst software I’ve ever put into production. The only other software that caused downtime for me, without my mistake, and not counting configuration logic changes, was a bug in Passenger Phusion. It was a single downtime-causing bug in it over a decade (for me), and the devs fixed it with me in no time after I reported it.

I will never though net/haproxy again.

If I ignore the above, it’s still one more software layer that can bring additional bugs. I don’t see how it would help with the KeepAlive connections. I think I would have the same amount of them. Maybe it is not true. But I already don’t understand how I can have 6,000 established connections with 2,500 MaxRequestWorkers. When I don’t understand something, I prefer another solution than putting one more layer into production, above it, which I also don’t know in 100%.

One can think the net/haproxy was my mistake, and one can be right about it. All I know is over many years of non-primary-profession server administration, I experienced similar instability only with databases/mongodb and an early Cassandra release. They caused no downtime as I did not put the latter two into production.

databases/mongodb has its hype as well as net/haproxy.

SirDice · May 28, 2018

Mage said:
net/haproxy is unstable, caused me downtime when I did nothing on the server.

Then you must have done something wrong. It's been working just fine for several of my clients. I've also managed a huge HAProxy installation for a rather large free porn hoster. I can assure you it can handle a lot more connections than you will ever need and the application itself is extremely stable.

Mage · May 28, 2018

SirDice said:
Then you must have done something wrong. It's been working just fine for several of my clients. I've also managed a huge HAProxy installation for a rather large free porn hoster. I can assure you it can handle a lot more connections than you will ever need and the application itself is extremely stable.

You don’t know how many connections I need.

The issue with HAProxy was not the amount of the connections. A simple restart already took too long at times. I don’t see how this can be my mistake.

I don’t see it from your answer how it would help with the KeepAlive connections. I’m almost sure it would make the situation worse regarding the number of the connections unless I would configure it to be a single point of failure.

The current issue is not that Apache can’t handle enough connections. The issue is that it can handle more than it should (based on what I think). Of course, the primary issue is if it cannot handle the connections. But if it can handle more than I think it should, I can’t create a proper configuration, except for luck, because I don’t know where the ceiling is. If I think it should be 2,500, and I see it can handle 6,500 connections, it’s difficult to tell whether the limit is 8,000 or 10,000.

Or, maybe I get the concept wrong. I might be looking for the issue at the wrong place as well.

The answer is most likely the event MPM. https://httpd.apache.org/docs/2.4/mod/event.html

"The event MPM handles some connections in an asynchronous way, where request worker threads are only allocated for short periods of time as needed, and other connections with one request worker thread reserved per connection. This can lead to situations where all workers are tied up and no worker thread is available to handle new work on established async connections."

I guess I need to read the whole documentation, and do testing to understand how long a thread can be linked to a connection, and then what the connection limit is, and how to tell how many active threads I have, and so.

I think I read something about two years ago in an Apache vs. NGINX argument that "the event-based server is wonderful unless you have traffic". Maybe I understood it now. The issue is not the limit but that I don’t know where the limit is.

VladiBG · May 28, 2018

There's a plenty of examples in the internet how to configure the apache to serve more than 10k concurrent connections and how to handle ddos and slow response http attacks.

Mage · May 30, 2018

VladiBG said:
There's a plenty of examples in the internet how to configure the apache to serve more than 10k concurrent connections and how to handle ddos and slow response http attacks.

The primary question was how I could have 7,500+ established connections with 2,500 MaxRequestWorkers. I prefer to understand it first what I am doing before I copy config examples from the internet to production.

I tried to Google for it. It’s difficult to search for "more established connections than MaxRequestWorkers", and find the answer. What I found was what you wrote: configuration examples.

It was unlikely a DDOS attack but a traffic spike with high KeepAliveTimeout. The issue was likely related to the mod_event. I double checked it yesterday again, and there was nothing relevant in the Apache error logs, /var/log/all.log, or dmesg.

Installing a single HAProxy node would create a single point of failure. Installing two nodes would double the number of the alive connection.

I can only repeat that the goal was not to set up a server that can handle 10k+ connections but to understand it first how many connections it can handle, and why.

SirDice · May 30, 2018

Does your website have a database backend? Have you looked at that? If every worker process takes just a fraction longer to complete because the database is slow (for whatever reason) this effect can ripple through the entire chain causing workers to stick around longer than usual, which in turn causes more workers to get spawned and if the limit of workers is reached connections will start queuing. Similar to one car on the motorway breaking a little too hard, the car behind it has to break harder, the car behind that even harder and before you know it there's a slow moving traffic jam. All because of that one car breaking.

VladiBG · May 30, 2018

mpm_event module handles some connections in an asynchronous way, where request worker threads are only allocated for short periods of time as needed, and other connections with one request worker thread reserved per connection.

You can use mpm_info module or apachectl -V.
In your configuration you are limited of 24320 maximum number of connections that will be processed simultaneously and total number of 38400 connections.

edit:
for your first config you have total number of 5000 simultaneous connections and total number of 7500 connections.

Mage · Jun 1, 2018

SirDice said:
Does your website have a database backend? Have you looked at that? If every worker process takes just a fraction longer to complete because the database is slow (for whatever reason) this effect can ripple through the entire chain causing workers to stick around longer than usual, which in turn causes more workers to get spawned and if the limit of workers is reached connections will start queuing. Similar to one car on the motorway breaking a little too hard, the car behind it has to break harder, the car behind that even harder and before you know it there's a slow moving traffic jam. All because of that one car breaking.

I know what you mean. Yes, this can happen.

It wasn’t the case this time. The number of the Passenger processes were below the limit. Passenger would have started more processes if the application responded slow.

The load was between 1 and 2. This is too low. It is usually between 2 and 4 on these servers, and it goes higher under load (no surprise). The low load and the low CPU usage showed that it was not the application. It was also not the database because I would have seen it in the logs.

Apache restart and reboot helped temporarily. Increasing the MPM limits, and decreasing the KeepAliveTimeout solved the issue with the server. It likely was the number of the connections.

Mage · Jun 1, 2018

VladiBG said:
You can use mpm_info module or apachectl -V.
In your configuration you are limited of 24320 maximum number of connections that will be processed simultaneously and total number of 38400 connections.

edit:
for your first config you have total number of 5000 simultaneous connections and total number of 7500 connections.

Yes, I think this is the answer I have been looking for. As far as I could see it, the issue started happening at 7,500 connections. I didn’t have the exact number. After the restart with the first config, I was looking at the number of the connections. At 6,000, it was still fine.

Where does the 7,500 come from?

# apachectl -V
Server version: Apache/2.4.33 (FreeBSD)
Server built: unknown
Server's Module Magic Number: 20120211:76
Server loaded: APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture: 64-bit
Server MPM: event
threaded: yes (fixed thread count)
forked: yes (variable process count)
Server compiled with....
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses disabled)
-D APR_USE_FLOCK_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=256
-D HTTPD_ROOT="/usr/local"
-D SUEXEC_BIN="/usr/local/bin/suexec"
-D DEFAULT_PIDLOG="/var/run/httpd.pid"
-D DEFAULT_SCOREBOARD="/var/run/apache_runtime_status"
-D DEFAULT_ERRORLOG="/var/log/httpd-error.log"
-D AP_TYPES_CONFIG_FILE="etc/apache24/mime.types"
-D SERVER_CONFIG_FILE="etc/apache24/httpd.conf"

I didn’t file mpm_info either with Google or in the httpd.conf.

VladiBG · Jun 2, 2018

MaxRequestWorkers is the number of the max simultaneously connections that will been processed, other connections above that number will be put in the queue even if you server has free workers they won't be processed.
The numbers are from https://httpd.apache.org/docs/2.4/mod/event.html

max_connections = (ThreadsPerChild + (AsyncRequestWorkerFactor * idle_workers)) * ServerLimit
max_connections = (AsyncRequestWorkerFactor + 1) * MaxRequestWorkers

mod_info and mod_status will give you better overview of the number of the workers.

Mage · Jun 2, 2018

VladiBG said:
MaxRequestWorkers is the number of the max simultaneously connections that will been processed, other connections above that number will be put in the queue even if you server has free workers they won't be processed.
The numbers are from https://httpd.apache.org/docs/2.4/mod/event.html

max_connections = (ThreadsPerChild + (AsyncRequestWorkerFactor * idle_workers)) * ServerLimit
max_connections = (AsyncRequestWorkerFactor + 1) * MaxRequestWorkers

mod_info and mod_status will give you better overview of the number of the workers.

Yes, this is it. I still didn't have time to read the page, but now I know what to look for, and it’s the right place. Thank you.

Apache MaxRequestWorkers vs netstat

Mage

SirDice

Administrator

max21

SirDice

Administrator

Mage

SirDice

Administrator

Mage

VladiBG

Mage

SirDice

Administrator

VladiBG

Mage

Mage

VladiBG

Mage