My server keeps running out of RAM

ghostcorps · Apr 25, 2012

After installing nginx on the jailed webserver it does not look to run any faster than apache.

I'll try to tweak it but, at least it does not crash when I run 20 threads.

Apache test
an -n 100 -c 10 MY.VPN.IP:80

Code:

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.254.2 (be patient).....done


Server Software:        Apache
Server Hostname:        192.168.254.2
Server Port:            80

Document Path:          /
Document Length:        6332 bytes

Concurrency Level:      10
Time taken for tests:   62.338 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      676400 bytes
HTML transferred:       633200 bytes
Requests per second:    1.60 [#/sec] (mean)
Time per request:       6233.757 [ms] (mean)
Time per request:       623.376 [ms] (mean, across all concurrent requests)
Transfer rate:          10.60 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      277  284  27.7    280     549
Processing:  1621 5660 3657.7   4952   19707
Waiting:      867 4835 3643.7   4234   18866
Total:       1899 5944 3656.3   5230   19987

Percentage of the requests served within a certain time (ms)
  50%   5230
  66%   6324
  75%   7561
  80%   7907
  90%  10376
  95%  14547
  98%  19144
  99%  19987
 100%  19987 (longest request)

Nginx test
an -n 100 -c 10 MY.VPN.IP:90

Code:

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.254.2 (be patient).....done


Server Software:        nginx/1.0.14
Server Hostname:        192.168.254.2
Server Port:            90

Document Path:          /
Document Length:        6867 bytes

Concurrency Level:      10
Time taken for tests:   63.760 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      724300 bytes
HTML transferred:       686700 bytes
Requests per second:    1.57 [#/sec] (mean)
Time per request:       6375.965 [ms] (mean)
Time per request:       637.596 [ms] (mean, across all concurrent requests)
Transfer rate:          11.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      274  281   4.9    280     314
Processing:  2383 5659 1188.1   5664    9638
Waiting:     2012 5362 1146.0   5303    9172
Total:       2662 5939 1187.3   5943    9918

Percentage of the requests served within a certain time (ms)
  50%   5943
  66%   6411
  75%   6653
  80%   6753
  90%   7585
  95%   8034
  98%   8296
  99%   9918
 100%   9918 (longest request)

aa · Apr 25, 2012

ghostcorps said:
... at least it does not crash when I run 20 threads.

Isn't that enough a reason to convert?

ghostcorps · Apr 25, 2012

aa: It is enough reason and I am currently working through the migration. If all else fails nginx will still be a step towards more stability. But this result also suggests that the bottleneck brought to light by the crashes is not caused the webserver. If I could find what it is then I expect the site will run much faster and be more stable regardless of which webserver I use.

Then I can do a real comparison of the two without them being shackled by whatever is causing the low thread per-second count.

einthusan · Apr 26, 2012

ghostcorps said:
aa: It is enough reason and I am currently working through the migration. If all else fails nginx will still be a step towards more stability. But this result also suggests that the bottleneck brought to light by the crashes is not caused the webserver. If I could find what it is then I expect the site will run much faster and be more stable regardless of which webserver I use.

Then I can do a real comparison of the two without them being shackled by whatever is causing the low thread per-second count.

Look at the time taken to complete 90% of the requests. The problem may be that the computer running ab can't handle the test load. Maybe you should try testing from a different client. Also, I doubt any setting on your computer is causing the super low req/s. have you tried testing with no concurrency? If the numbers were much higher than it's your client PC which is not able to do concurrent tests.

ghostcorps · Apr 26, 2012

The tests above were performed from a very high end games PC on an ADSL+ connection. I am pretty sure it would have been able to handle the task.

If I run the tests from the server hosting the jailed webserver the results are 998 for apache vs 2231 tasks p/s. This made me suspect it may simply be because I am on the otherside of the world from the server I used another VPS I manage that is also in the US but the results were the same as above.

Even with the concurrency turned off the results are the same.

Could it be a networking issue between the webserver > jail host > public?

anon12b · Apr 26, 2012

I know the issue has already been approached, but I had a client have a lot of problems with memory when using Wordpress. The easiest solution for that situation was to put a caching nginx in front of apache/Wordpress.

If you can be bothered with fiddling, Varnish is pretty amazing (and by phk <3). It does use quite a bit of memory by design, though. For that reason alone it is not my primary recommendation here.

einthusan · Apr 27, 2012

Run a iperf test. Very simple to do. Install it on both client and server. This will test your network cards capability and maybe your network card is not working correctly? Just a guess. I have never used jails so I don't know but I don't think that should be a problem. I am in the progress of moving my web server from Ubuntu back to FreeBSD so I'll run some test to see how it looks on my machine.

ghostcorps · Apr 27, 2012

anon12b said:
I know the issue has already been approached, but I had a client have a lot of problems with memory when using Wordpress. The easiest solution for that situation was to put a caching nginx in front of apache/Wordpress

If you can be bothered with fiddling, Varnish is pretty amazing (and by phk <3). It does use quite a bit of memory by design, though. For that reason alone it is not my primary recommendation here.

Thanks anon, I have been running some of the ab tests against flat index.htmlfiles to avoid any influence from Wordpress, PHP or MySQL.

I'll try to keep the solution simple if I can

einthusan: I'll play with iperf tonight and post some results later on.

ghostcorps · Apr 27, 2012

Just finished the first set of iperf tests and thankfully the bottleneck is not the Jail

Is this as slow/fast as you would expect for a connection from Aust to the US? I can't use the other US based server I manage to do a comparative test.

Listener on Jail Host in the US
# iperf -s -P 2 -i 5 -p 88 -f k
Server listening on TCP port 88
TCP window size: 64.0 KByte (default)

Code:

Interval/sec	Transfer KB	Bandwidth KB/s
0.0-5	        358		586
5.0-10.0	491		804
10.0-15.0	457		748
15.0-20.0	464		761
20.0-25.0	457		749
25.0-30.0	414		678
30.0-35.0	421		690
35.0-40.0	475		778
40.0-45.0	444		727
45.0-50.0	345		566
50.0-55.0	389		637

Client in Aust
# iperf -c xxx.xxx.xxx.xx2 -P 1 -i 5 -p 88 -f B -t 60 -T 1
Client connecting to xxx.xxx.xxx.xx2, TCP port 88
TCP window size: 33396 Byte (default)

Code:

Interval/sec	Transfer KB	Bandwidth KB/s
0.0-5	        393		78
5.0-10.0	393		78
10.0-15.0	524		104
15.0-20.0	524		104
20.0-25.0	393		78
25.0-30.0	524		104
30.0-35.0	393		78
35.0-40.0	524		104
40.0-45.0	393		78
45.0-50.0	393		78
50.0-55.0	393		78

Listener on Jailed Webserver in US
# iperf -s -P 2 -i 5 -p 88 -f k
Server listening on TCP port 88
TCP window size: 64.0 KByte (default)

Code:

Interval/sec	Transfer KB	Bandwidth KB/s
0.0-5	        317		519
5.0-10.0	417		683
10.0-15.0	487		797
15.0-20.0	478		784
20.0-25.0	453		742
25.0-30.0	462		757
30.0-35.0	461		755
35.0-40.0	461		755
40.0-45.0	461		755
45.0-50.0	462		757
0.0-55.0	461		755

Client in aust
# iperf -c xxx.xxx.xxx.xx3 -P 1 -i 5 -p 88 -f B -t 60 -T 1
Client connecting to xxx.xxx.xxx.xx3, TCP port 88
TCP window size: 33396 Byte (default)

Code:

Interval/sec	Transfer KB	Bandwidth KB/s
0.0-5	        262		52
5.0-10.0	524		104
10.0-15.0	393		78
15.0-20.0	524		104
20.0-25.0	524		104
25.0-30.0	524		104
30.0-35.0	393		78
35.0-40.0	524		104
40.0-45.0	393		78
45.0-50.0	524		104
50.0-55.0	524		104

einthusan · Apr 28, 2012

To me that looks bad! My results are below, however I have a 1 GbE connection.

[CMD="iperf"]-P 10 -c IP.ADD.SAN.ITIZED -i 5 -f k[/CMD]

Code:

------------------------------------------------------------
Client connecting to IP.ADD.SAN.ITIZED, TCP port 5001
TCP window size: 47.8 KByte (default)
------------------------------------------------------------
[  3] local IP.ADD.SAN.ITIZED port 26689 connected with IP.ADD.SAN.ITIZED port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  231040 KBytes  378536 Kbits/sec
[  3]  5.0-10.0 sec  232960 KBytes  381682 Kbits/sec
[  3]  0.0-10.0 sec  464128 KBytes  380186 Kbits/sec

ghostcorps · Apr 28, 2012

einthusan said:
To me that looks bad! My results are below, however I have a 1 GbE connection.

To give an idea of the degradation: from my other server in the US that has a max 5Mbps download I get about 1.9Mbps from the VPS. From Australia on a 100Mbps line I get about 200Kbps!

The above was tested with a 100MB zip file and wget.

Unfortunately I can not install iperf on the other US server. Would you mind if I PM'd you with my url to run iperf against?

einthusan · Apr 28, 2012

ghostcorps said:
To give an idea of the degradation: From my other server in the US that has a max 5Mbps download I get about 1.9Mbps from the VPS. From Aust on an 100Mbps line I get about 200Kbps!

The above was tested with a 100mb zip file and wget.

Unfortunately I can not install iperf on the other US server. Would you mind if I PM'd you with my url to run iperf against?

After running the test, your throughput was about 6 MB/s. Keep in mind that Mbps and MB per second are not the same. In your comment above, you were probably talking about MB per second but you stated it as Mbps.

1 MB/s (Megabytes per second) = 8 Mbps (Megabits per second).

Anyhow, 6 MB/s is low, extremely low. An old computer can do better

However, that should be enough to do more than 1 req/sec from both web servers. I suggest looking into your network card configurations first. iperf only tests the network card, so it can't be any other issue such as hard drive I/O or server setting.

einthusan · Apr 28, 2012

einthusan said:
So yea after running the test, your throughput about 6 MB/s. Keep in mind that Mbps and MB per second are not the same. In your comment above, you were probably talking about MB per second but you stated it as Mbps.

1 MB/s (Megabytes per second) = 8 Mbps (Megabits per second).

Anyhow, 6 MB/s is low, extremely low. A old computer can do better However, that should be enough to do more than 1 req/sec from both web servers. I suggest looking into your network card configurations first. iperf only tests the network card, so it can't be any other issue such as hard drive I/O or server setting.

Well, actually, maybe it's not worth checking your network card, I know it's very low throughput but it's not like you're doing file serving or video streaming. So keep debugging

ghostcorps · Apr 28, 2012

Sorry, you are right, it should have been KB/s and MB/s.

I am speaking with the host now to get an idea of what the max throughput should be. It is a pretty cheap package I think.

ghostcorps · Apr 28, 2012

Actually it is a streaming server

ghostcorps · Apr 30, 2012

Something I should have done ages ago:

I turned off the jails to negate the virtual NICs and ran the benchmark against a minimal apache install on the host server. The results were no different. Still waiting on the host to give an approximation on the minimal bandwidth I should expect.

ghostcorps · May 6, 2012

I am still stumped lol

I have asked to have a vanilla FreeBSD VPS set up so I can get an idea of the expected throughput. But I have definitely done something screwy, traceroute starts timing out as soon as it touches anything on my system and I can not trace out at all. I assumed it was ipfw but I stopped it and still have the same issue.

It has to be a basic networking issue, which makes sence cos I am pretty hit-and-miss when it comes to that.

I installed webmin and munin and had a far more experienced admin have a look and we still couldn't find anything wrong. Below are the basic system configs is there anything that stands out?

/etc/inetd.conf is completely commented out.

This is what is left of rc.conf after I remove OpenVPN, apache the jails and a bunch of other services that are stopped during the failed traceroute

/etc/rc.conf

Code:

hostname="MY.URL.COM"
ifconfig_em0="inet XXX.XXX.XXX.XX2 netmask 255.255.255.248"
defaultrouter="XXX.XXX.XX2.XXX"
gateway_enable="YES"
inetd_enable="YES"
inetd_flags="-wW -a XXX.XXX.XXX.XX2"
rpcbind_enable="NO"

/boot/defaults/loader.conf

Code:

##############################################################
###  Networking modules  #####################################
##############################################################
if_disc_load="NO"               # Discard device
if_ef_load="NO"                 # pseudo-device providing support for multiple
                                # ethernet frame types
if_epair_load="NO"              # Virtual b-t-b Ethernet-like interface pair
if_faith_load="NO"              # IPv6-to-IPv4 TCP relay capturing interface
if_gif_load="NO"                # generic tunnel interface
if_gre_load="NO"                # encapsulating network device
if_stf_load="NO"                # 6to4 tunnel interface
if_tap_load="NO"                # Ethernet tunnel software network interface
if_tun_load="NO"                # Tunnel driver (user process ppp)
if_vlan_load="NO"               # IEEE 802.1Q VLAN network interface
ipfw_load="NO"                  # Firewall
pf_load="NO"                    # packet filter

I setup the security ages ago with a guide that I can't find right now. Looking back at it I can see some things that sound very much like what I am experiencing:... a blackhole.

/etc/sysctl.conf

Code:

net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.ip.random_id=1

I commented out the blackholes and reloaded it, but this had no effect either.

ghostcorps · May 7, 2012

Just found another stupid thing I did

I had somehow managed to have two rules in my firewall for ICMP type 8. One allowing and then one denying! After removing the second one it I can traceroute out from any location on the system, but whether tracerouting in or out it still times out when it gets to the VPSs' hop. I'm not concerned with this though because I now observe the following behaviour:

Running ab from remote location whilst on VPN:

Against host server

ab -n 20 -c 10 192.168.254.1 <= VPN network

Code:

3.05 [#/sec]
2726ms Longest Request

ab -n 20 -c 10 xxx.xxx.xxx.1 <= External IP

Code:

3.21 [#/sec]
2726ms Longest Request

ab -n 20 -c 10 MY.HOST.URL.COM

Code:

3.23 [#/sec]
2742ms Longest Request

Against jailed webserver

ab -n 20 -c 10 192.168.254.2 <= VPN network

Code:

1.28 [#/sec]
13754ms Longest Request

ab -n 20 -c 10 xxx.xxx.xxx.2 <= External IP

Code:

1.43 [#/sec]
12599ms Longest Request

ab -n 20 -c 10 MY.SITE.URL.COM

Code:

1.42 [#/sec]
12761ms Longest Request

ab -n 20 -c 10 VHOST.SITE.URL.COM <= Test site

Code:

[B]3.23 [#/sec]
2766ms Longest Request[/B]

Any reasonable person would look at this and say it is clearly something to do wih the default site config in the jailed webservers' apache and would expect the same test against nginx to work correctly and yet:

ab -n 20 -c 10 xxx.xxx.xxx.xx2:90 <= nginx site is on port 90

Code:

1.01 [#/sec]
10677ms Longest Request

It is kind of odd that the test site should have the same throughput as the host, while the other two sites on the jailed webserver are so slow. Is there anything that an nginx site and the default apache site have in common that is not shared by an apache vhost?

ghostcorps · May 10, 2012

I am not surprised if you are all sick of my dumb moves, but I have one more for you...

My Vhost has an auth file, so when I was testing ab against it it was failing the auth be registering as a completed request. Conversely, the main site was able to load all the content in each request and so was about 10x larger, hence why it took 10x longer.

*dumb*

Anyway, back on track now I am about ready to move to nginx, just doing some final checks.

Thanks again for your persistence everyone, I am sorry to have wasted your time.

einthusan · May 10, 2012

ghostcorps said:
I am not surprised if you are all sick of my dumb moves, but I have one more for you...

My Vhost has an auth file, so when I was testing ab against it it was failing the auth be registering as a completed request. Conversely, the main site was able to load all the content in each request and so was about 10x larger, hence why it took 10x longer.

*dumb*

Anyway, back on track now I am about ready to move to nginx, just doing some final checks.

Thanks again for your persistence everyone, I am sorry to have wasted your time.

It's just that I have been very busy myself. I am testing out Varnish in front of Nginx but can't seem to get it working. Varnish is an http accelerator that will cache http requests.

ghostcorps · May 10, 2012

Thanks for getting back to me

Today I did an ab test with 100 concurrency on nginx and it didn't flinch. Providing things don't lock up again I guess this is resolved.

As for the latency, I won't post any more logs but I have been using nytimes.com as a benchmark and am more or less comparable. So that will have to do

I'll mark this solved once nginx is live and properly stress tested.

einthusan · May 10, 2012

ghostcorps said:
Thanks for getting back to me

Today I did an ab test with 100 concurrency on nginx and it didn't flinch. Providing things don't lock up again I guess this is resolved.

As for the latency, I won't post any more logs but I have been using nytimes.com as a benchmark and am more or less comparable. So that will have to do

I'll mark this solved once nginx is live and properly stress tested.

Awesome man! I know it was a huge pain and a lot of learning but it's good to know you have progressed so much. By the way, Varnish isn't that good and not worth the time and effort to set it up.