Hello,
I'm trying here to see if I can get some help on this subject.
Just so you can understand, I've been using quiet some time and resources on this matter but I couldn't find why it's doing that.
Our architecture might be "overkill", but we are mostly inheriting it and trying to build on it.
The "working" situation:
Since it might be overkill, I'm trying to reduce the number of apache servers to 2 servers (nginx upstream block configured with 2 lines), but it produces some problems.
What I can see in nginx error log, and access logs is:
I've tried asking some friends, and also asking AI chat (someone suggested me to try), but noone of what I could try worked : SYSCTL, apache config, nginx config.
Would you have any recommendations?
I'm trying here to see if I can get some help on this subject.
Just so you can understand, I've been using quiet some time and resources on this matter but I couldn't find why it's doing that.
Our architecture might be "overkill", but we are mostly inheriting it and trying to build on it.
The "working" situation:
- 5 FreeBSD servers, installed with 13.2-RELEASE, and Apache 2.4.58 for web contents
- Server arch : amd64, 6 cores multithreaded CPU, 64GB RAM, 2 SATA disks for systems (ZFS mirror), 2 Samsung SSD 870 disks for fast access on disk cache data (ZFS mirror)
- Each server has the exact same configuration and are serving the same types of contents
- IPFW is configure on the host to only allow specific IP addresses to access
- Each server has a jail with the Web Server and data inside a zfs dataset so we can move it fast if we need new server.
- Service is serving mostly through CGI because of internal technologies
- Apache is configured with the worker MPM module
-
Apache config:
<IfModule mpm_worker_module> ServerLimit 32 StartServers 8 ThreadLimit 512 MaxRequestWorkers 16384 ThreadsPerChild 512 MinSpareThreads 512 MaxSpareThreads 1024 MaxConnectionsPerChild 10000 </IfModule>
-
- Server is configure with these SYSCTL
-
Code:
security.jail.allow_raw_sockets=1 security.jail.mount_allowed=1 kern.ipc.somaxconn=32768 net.inet.tcp.maxtcptw=200000 net.inet.icmp.icmplim=50 net.inet.icmp.drop_redirect=1 net.inet.tcp.icmp_may_rst=0 net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 net.link.ether.inet.log_arp_wrong_iface=0 net.inet.tcp.msl=2500 net.inet.tcp.sendspace=262144 net.inet.tcp.recvspace=262144 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendbuf_inc=32768 net.inet.tcp.finwait2_timeout=500 net.inet.tcp.fast_finwait2_recycle=1 net.inet.ip.intr_queue_maxlen=4096
-
- 4 FreeBSD servers, installed with 14.3-RELEASE, nginx-full 1.28.2 installed
- server arch: amd64, 4 cores multithreaded CPU, 64GB RAM, 2 SATA disks for system (ZFS mirror)
- each server has the exact same configuration
- nginx is configured as load-balacing the 5 apache server
- upstream is composed of the 5 servers jail IP addresses with SSL port: a.b.c.d:443
- some pieces of configuration
NGINX:worker_processes 8; events { worker_connections 8192; accept_mutex off; use kqueue; } # [...] http { # [...] client_header_buffer_size 16k; large_client_header_buffers 4 16k; sendfile on; keepalive_timeout 65; gzip_proxied any; # [...] server { listen 443 ssl; http2 on; # [using https://ssl-config.mozilla.org/ intermediate configuration for SSL] location / { access_log /path/to/access.log combined; proxy_hide_header Upgrade; proxy_hide_header X-Powered-By; proxy_connect_timeout 120s; proxy_read_timeout 120s; proxy_send_timeout 120s; proxy_redirect off; proxy_ssl_verify off; # [fronts is a 5 lines upstream block only with `server a.b.c.d:443;`] proxy_pass https://fronts; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # [...] } # [...] }
- SYSCTL are mostly the same
- PF is configured
- All servers are geographicaly close to each other, so the ping delay is well under 1ms, and the link is 1Gbit/s
Since it might be overkill, I'm trying to reduce the number of apache servers to 2 servers (nginx upstream block configured with 2 lines), but it produces some problems.
What I can see in nginx error log, and access logs is:
- error log: "no live upstreams", meaning that nginx has marked the 2 remote backend as down
- access logs: a lot of http code 502 bad gateway
- I can see that they are serving requests, and there are no errors in logs (host and jail) telling we've reached any specific limit.
- There are still memory available on the system (ofc), and CPU is idle at 60%.
- The
netstat -4Lancommand does not show any queueing. - Some apache_exporter+prometheus+grafana dashboard does really show anything about apache suffering.
- Direct access to the Apache server with `/etc/hosts` configuration, or CURL commands with specific options, show that the Apache web server is responding in time.
I've tried asking some friends, and also asking AI chat (someone suggested me to try), but noone of what I could try worked : SYSCTL, apache config, nginx config.
Would you have any recommendations?