[ Apache + Nginx ] Issue with availability on high traffic

yom · Mar 29, 2026

Hello,
I'm trying here to see if I can get some help on this subject.
Just so you can understand, I've been using quiet some time and resources on this matter but I couldn't find why it's doing that.

Our architecture might be "overkill", but we are mostly inheriting it and trying to build on it.
The "working" situation:

5 FreeBSD servers, installed with 13.2-RELEASE, and Apache 2.4.58 for web contents

Server arch : amd64, 6 cores multithreaded CPU, 64GB RAM, 2 SATA disks for systems (ZFS mirror), 2 Samsung SSD 870 disks for fast access on disk cache data (ZFS mirror)
Each server has the exact same configuration and are serving the same types of contents
IPFW is configure on the host to only allow specific IP addresses to access
Each server has a jail with the Web Server and data inside a zfs dataset so we can move it fast if we need new server.
Service is serving mostly through CGI because of internal technologies

Apache is configured with the worker MPM module

Apache config:

<IfModule mpm_worker_module>
    ServerLimit             32
    StartServers            8
    ThreadLimit             512
    MaxRequestWorkers       16384
    ThreadsPerChild         512
    MinSpareThreads         512
    MaxSpareThreads         1024
    MaxConnectionsPerChild  10000
</IfModule>

Server is configure with these SYSCTL

Code:

security.jail.allow_raw_sockets=1
security.jail.mount_allowed=1

kern.ipc.somaxconn=32768
net.inet.tcp.maxtcptw=200000
net.inet.icmp.icmplim=50
net.inet.icmp.drop_redirect=1
net.inet.tcp.icmp_may_rst=0
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.link.ether.inet.log_arp_wrong_iface=0
net.inet.tcp.msl=2500
net.inet.tcp.sendspace=262144
net.inet.tcp.recvspace=262144
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_inc=32768
net.inet.tcp.finwait2_timeout=500
net.inet.tcp.fast_finwait2_recycle=1
net.inet.ip.intr_queue_maxlen=4096

4 FreeBSD servers, installed with 14.3-RELEASE, nginx-full 1.28.2 installed

server arch: amd64, 4 cores multithreaded CPU, 64GB RAM, 2 SATA disks for system (ZFS mirror)
each server has the exact same configuration

nginx is configured as load-balacing the 5 apache server

upstream is composed of the 5 servers jail IP addresses with SSL port: a.b.c.d:443

some pieces of configuration

NGINX:

worker_processes 8;

events {
    worker_connections 8192;
    accept_mutex off;
    use kqueue;
}
# [...]
http {
    # [...]
    client_header_buffer_size 16k;
    large_client_header_buffers 4 16k;

    sendfile on;
    keepalive_timeout 65;

    gzip_proxied any;
    # [...]
    server {
        listen 443 ssl;
        http2 on;
        
        # [using https://ssl-config.mozilla.org/ intermediate configuration for SSL]
        
        location / {
            access_log /path/to/access.log combined;
            proxy_hide_header Upgrade;
            proxy_hide_header X-Powered-By;
            proxy_connect_timeout 120s;
            proxy_read_timeout 120s;
            proxy_send_timeout 120s;
            proxy_redirect off;
            proxy_ssl_verify off;
            # [fronts is a 5 lines upstream block only with `server a.b.c.d:443;`]
            proxy_pass https://fronts;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
        # [...]
    }
    # [...]
}

SYSCTL are mostly the same
PF is configured

All servers are geographicaly close to each other, so the ping delay is well under 1ms, and the link is 1Gbit/s

What is not working:
Since it might be overkill, I'm trying to reduce the number of apache servers to 2 servers (nginx upstream block configured with 2 lines), but it produces some problems.
What I can see in nginx error log, and access logs is:

error log: "no live upstreams", meaning that nginx has marked the 2 remote backend as down
access logs: a lot of http code 502 bad gateway

On the Apache server:

I can see that they are serving requests, and there are no errors in logs (host and jail) telling we've reached any specific limit.
There are still memory available on the system (ofc), and CPU is idle at 60%.
The netstat -4Lan command does not show any queueing.
Some apache_exporter+prometheus+grafana dashboard does really show anything about apache suffering.
Direct access to the Apache server with `/etc/hosts` configuration, or CURL commands with specific options, show that the Apache web server is responding in time.

I've tried asking some friends, and also asking AI chat (someone suggested me to try), but noone of what I could try worked : SYSCTL, apache config, nginx config.

Would you have any recommendations?

joneum@ · Mar 29, 2026

This does not look like an Apache capacity issue, but rather nginx incorrectly marking your backends as “down”.

Indicators:

Apache responds fine when accessed directly (curl, /etc/hosts)
CPU/RAM are not exhausted
no listen queue issues
yet nginx reports "no live upstreams" and returns 502

Likely causes:

1. Upstream failure handling (critical with only 2 backends)
nginx may mark backends as failed too aggressively. With only 2 servers, that leads to total outage.

Code:

upstream fronts {
    least_conn;

    server a.b.c.d:443 max_fails=5 fail_timeout=10s;
    server e.f.g.h:443 max_fails=5 fail_timeout=10s;

    keepalive 100;
}

Code:

proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;

2. No keepalive → high connection overhead

Code:

proxy_http_version 1.1;
proxy_set_header Connection "";

3. CGI as bottleneck
CGI (process-per-request) causes latency spikes under load.
With 5 backends this is hidden, with 2 it leads to nginx marking them as failed.

Why it works with 5 servers but not with 2:

With 5 backends, slow responses are absorbed.
With 2, a few slow requests are enough → both marked down → "no live upstreams".

Fixing upstream config + enabling keepalive should already help.
Long-term: replace CGI (e.g. FastCGI).

If you can share your full upstream block or relevant error logs, this can be confirmed more precisely.

RianKellyIT · Apr 5, 2026

Building on what joneum said about the upstream failure detection, there are a couple of additional tuning areas that are likely contributing here.

First, check your proxy_connect_timeout and proxy_read_timeout values. If Apache is briefly slow under load (even 1-2 seconds), nginx with default timeouts might still mark the backend as failed if it sees connection refused or reset. Tune max_fails and fail_timeout:

Code:

upstream backends {
    server 10.0.0.1:80 max_fails=5 fail_timeout=30s;
    server 10.0.0.2:80 max_fails=5 fail_timeout=30s;
}

Second, add keepalive connections between nginx and Apache. Without this, nginx opens a new TCP connection for every proxied request, which under high traffic can exhaust ephemeral ports or hit Apache's connection limit:

Code:

upstream backends {
    server 10.0.0.1:80;
    server 10.0.0.2:80;
    keepalive 64;
}

And in your proxy location block:

Code:

proxy_http_version 1.1;
proxy_set_header Connection "";

Third, check Apache's MaxRequestWorkers (or MaxClients in older configs). If set too low for your traffic, Apache stops accepting new connections, nginx gets connection refused, and marks the backend as down. On 64GB RAM boxes you can safely set this to 1000+ for prefork or higher for event MPM.

Finally, on the FreeBSD side, check kern.ipc.somaxconn and net.inet.tcp.msl. The default somaxconn of 128 is too low for high traffic reverse proxy setups. Set it to at least 4096 in sysctl.conf.

[ Apache + Nginx ] Issue with availability on high traffic

yom

joneum@

RianKellyIT