My server keeps running out of RAM

ghostcorps · Apr 6, 2012

Hi guys,

As of a day or two ago my server is shutting down due to low RAM, according to my host. But, I can not pin down the cause, I have replicated it by running a find command on /usr/local/etc/apache22/extras so no huge task but it has occurred both times I ran the command today. It is a media streaming server so it should not have any trouble with such a small request.

I have started looking through /var but so far I can not find out what is freaking it out.

Here is top at idle:

Code:

last pid:  2750;  load averages:  0.00,  0.00,  0.00
62 processes:  2 running, 60 sleeping
CPU:  0.3% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.5% idle
Mem: 76M Active, 83M Inact, 51M Wired, 444K Cache, 85M Buf, 521M Free
Swap: 988M Total, 988M Free
Order to sort: [B]res[/B]
  PID USERNAME     THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
 2609     88        18  44    0 86896K 45168K ucond    0:01  0.00% mysqld
 2694 www            1  45    0   124M 27496K lockf    0:01  0.00% httpd
 2698 www            1  50    0   122M 25460K lockf    0:01  0.00% httpd
 2695 www            1  50    0   122M 25420K lockf    0:01  0.00% httpd
 2697 www            1  50    0   122M 25420K lockf    0:01  0.00% httpd
 2696 www            1  44    0   112M 14808K lockf    0:00  0.00% httpd
 2706 www            1  44    0   112M 14752K lockf    0:00  0.00% httpd
 2705 www            1  44    0   112M 14752K kqread   0:00  0.00% httpd
 2693 root           1  44    0   112M 14744K select   0:00  0.00% httpd
 1450 www            1  44    0 71420K  7248K accept   0:00  0.00% httpd
 1451 www            1  63    0 71420K  7248K accept   0:00  0.00% httpd
 1452 www            1  63    0 71420K  7248K accept   0:00  0.00% httpd
 1453 www            1  63    0 71420K  7248K accept   0:00  0.00% httpd
 1454 www            1  63    0 71420K  7248K accept   0:00  0.00% httpd
 1376 root           1  44    0 71420K  7244K select   0:00  0.00% httpd
 2667 admin          1  44    0 38104K  5176K RUN      0:00  0.00% sshd
 2664 root           1  45    0 38104K  5168K sbwait   0:00  0.00% sshd
 1405 root           1  44    0 26172K  4500K select   0:00  0.00% sshd
 1242 root           1  44    0 11092K  4184K select   0:00  0.00% openvpn
 1411 root           1  44    0 12096K  4080K select   0:00  0.00% sendmail
 1417 smmsp          1  76    0 12096K  4012K pause    0:00  0.00% sendmail
 1780 smmsp          1  76    0 12004K  3864K pause    0:00  0.00% sendmail
 1774 root           1  44    0 12004K  3616K select   0:00  0.00% sendmail
 2670 root           1  44    0 10216K  2800K wait     0:00  0.00% bash
 2668 admin          1  44    0 10216K  2796K wait     0:00  0.00% bash
 2750 root           1  44    0  9336K  2288K RUN      0:00  0.00% top
 2669 admin          1  44    0 21668K  2008K wait     0:00  0.00% su
 2040     88         1  76    0  8264K  1860K wait     0:00  0.00% sh
 2077 root           1  44    0  8080K  1636K nanslp   0:00  0.00% cron
 1787 root           1  44    0  7952K  1612K nanslp   0:00  0.00% cron
 1424 root           1  44    0  7952K  1612K nanslp   0:00  0.00% cron
 1913 root           1  44    0  7024K  1584K select   0:00  0.00% syslogd
 1089 root           1  44    0  7024K  1564K select   0:00  0.00% syslogd
 1611 root           1  44    0  6896K  1560K select   0:00  0.00% syslogd
 2287 root           1  76    0  9008K  1396K select   0:00  0.00% inetd
 2440 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
 2445 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
 2441 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
 2446 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
 2447 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
 2442 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
 2443 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
 2444 root           1  76    0  6892K  1288K ttyin    0:00  0.00% getty
  115 root           1  76    0  2744K  1024K pause    0:00  0.00% adjkerntz
  852 root           1  44    0  3204K   724K select   0:00  0.00% devd

Can anyone please suggest a way to find the culprit? This is a production server and I am getting my arse kicked every time it goes down

gkontos · Apr 6, 2012

1) How exactly does it shut down?

2) What do your logs say when this happens? (/var/log/messages)

3) Is this a dedicated or a VPS?

blakjak · Apr 6, 2012

Your sever is running out of ram space

you need to have a SWAP partition during your installation of the FreeBSD OS. This
SWAP partition is used when your computer is running out of RAM space. I hope you have a swap partition?

ghostcorps · Apr 6, 2012

gkontos said:
1) How exactly does it shut down?

2) What do your logs say when this happens? (/var/log/messages)

3) Is this a dedicated or a VPS?

Thanks for the questions.

It is a VPS

It is not entirely clear how it stalls, but we are forced to reboot it through the VM to get it back. It was down this morning, we restarted it and it ran for a few hours. I logged in via ssh to make some changes to the apache config (unrelated). I ran a find search looking for a string in the /extras folder and after opening the file the session stalled. At that point the website which runs off a jailed webserver went offline, but the fail over page on the host was still live, albeit very slow to load.

After a short time there is nothing on either page and I am forced to reboot.

A few rules from /etc/ipfw.rules that are mentioned in messages:

Code:

$IPF 801 deny log all from any to HOST.SERVER 22-25
$IPF 900 deny log all from any to WEBSERVER.JAIL 1-79          
$IPF 910 allow log all from any to WEBSERVER.JAIL 80
$IPF 920 allow log all from any to WEBSERVER.JAIL 443
$IPF 930 deny log all from any to WEBSERVER.JAIL 81-442
$IPF 940 deny log all from any to WEBSERVER.JAIL 444-1934

/var/log/messages Starting from a flood of SYSERRs before the first crash, to now'ish. I have cut out a bunch of stuff that I didn't think was necessary.

Code:

Mar 27 13:32:12 DOMAIN sm-mta[48309]: q2RHW6bp048307: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 27 14:02:12 DOMAIN sm-mta[48509]: q2RI27mM048507: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 27 14:32:11 DOMAIN sm-mta[48630]: q2RIW6Fm048628: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 27 23:01:11 DOMAIN sm-mta[51704]: q2S316ug051625: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)
Mar 27 23:02:52 DOMAIN sm-mta[51860]: q2S32kYN051810: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)
Mar 27 23:02:52 DOMAIN sm-mta[51863]: q2S32lId051856: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)
Mar 27 23:47:10 DOMAIN sm-mta[56507]: q2S3l5fM056505: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 09:02:12 DOMAIN sm-mta[64836]: q2SD27Tx064834: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 09:32:11 DOMAIN sm-mta[64957]: q2SDW6uO064955: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 10:02:12 DOMAIN sm-mta[65156]: q2SE26gm065154: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 10:32:12 DOMAIN sm-mta[65280]: q2SEW7Xw065278: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 11:02:11 DOMAIN sm-mta[65482]: q2SF26lJ065480: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 11:32:12 DOMAIN sm-mta[65608]: q2SFW6H2065606: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 12:02:12 DOMAIN sm-mta[66145]: q2SG27eb066091: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 12:02:12 DOMAIN sm-mta[66147]: q2SG27ss066092: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 12:05:29 DOMAIN sm-mta[66335]: q2SG5OXj066330: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 12:05:29 DOMAIN sm-mta[66338]: q2SG5Oqt066331: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 12:10:01 DOMAIN sm-mta[70765]: q2SG9uMm070763: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 12:32:12 DOMAIN sm-mta[70880]: q2SGW7n6070878: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 13:02:12 DOMAIN sm-mta[71081]: q2SH27qc071079: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 13:32:12 DOMAIN sm-mta[71203]: q2SHW6EY071201: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 14:02:12 DOMAIN sm-mta[71407]: q2SI27SK071405: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 14:32:11 DOMAIN sm-mta[71531]: q2SIW6iW071529: SYSERR(root): database.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 23:01:11 DOMAIN sm-mta[74646]: q2T316R2074503: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 23:02:50 DOMAIN sm-mta[74804]: q2T32jvM074753: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 23:02:50 DOMAIN sm-mta[74807]: q2T32j9p074802: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)
Mar 28 23:50:41 DOMAIN sm-mta[79450]: q2T3oeYv079448: SYSERR(root): webserver.URL.com. config error: mail loops back to me (MX problem?)

***dmesg***

Apr  2 15:48:45 DOMAIN su: URLadmin to toor on /dev/pts/0
Apr  3 23:05:06 DOMAIN su: admin to root on /dev/pts/0
Apr  5 20:05:19 DOMAIN kernel: arp: XXX.XXX.XXX.3 moved from 00:ff:2d:81:3b:3c to 00:ff:03:09:cd:79 on tap0
Apr  5 20:05:35 DOMAIN sshd[xxx86]: error: PAM: authentication error for root from YYY.YYY.YYY
Apr  5 20:05:58 DOMAIN sshd[25888]: error: PAM: authentication error for toor from YYY.YYY.YYY
Apr  5 20:06:01 DOMAIN sshd[25888]: error: PAM: authentication error for toor from YYY.YYY.YYY
Apr  5 20:07:47 DOMAIN su: URLadmin to toor on /dev/pts/0
Apr  5 20:29:58 DOMAIN sshd[1401]: error: accept: Software caused connection abort
Apr  5 20:35:41 DOMAIN su: admin to root on /dev/pts/1
Apr  5 20:37:53 DOMAIN su: URLadmin to toor on /dev/pts/0
Apr  5 20:39:32 DOMAIN su: admin to root on /dev/pts/1
Apr  6 01:17:05 DOMAIN su: admin to root on /dev/pts/0
Apr  6 01:52:54 DOMAIN su: admin to root on /dev/pts/1


Apr  6 02:09:14 DOMAIN kernel: ipfw: limit 5 reached on entry 900
Apr  6 02:09:14 DOMAIN kernel: ipfw: limit 5 reached on entry 930
Apr  6 02:09:14 DOMAIN kernel: ipfw: limit 5 reached on entry 940
Apr  6 02:09:51 DOMAIN kernel: ipfw: limit 5 reached on entry 920
Apr  6 02:09:51 DOMAIN kernel: ipfw: limit 5 reached on entry 910
Apr  6 02:10:43 DOMAIN kernel: ipfw: limit 5 reached on entry 910
Apr  6 02:10:47 DOMAIN kernel: ipfw: limit 5 reached on entry 920
Apr  6 02:24:46 DOMAIN kernel: ipfw: limit 5 reached on entry 801

***dmesg***

Apr  6 03:58:42 DOMAIN kernel: ipfw: limit 5 reached on entry 801
Apr  6 03:59:38 DOMAIN fsck: /dev/da0s1e: 38 files, 145 used, 253670 free (30 frags, 31705 blocks, 0.0% fragmentation)
Apr  6 04:00:24 DOMAIN fsck: /dev/da0s1f: PARTIALLY TRUNCATED INODE I=711230
Apr  6 04:00:24 DOMAIN fsck: /dev/da0s1f: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.

Apr  6 04:03:01 DOMAIN kernel: ipfw: limit 5 reached on entry 910
Apr  6 04:05:47 DOMAIN kernel: ipfw: limit 5 reached on entry 920
Apr  6 04:19:33 DOMAIN kernel: ipfw: limit 5 reached on entry 900
Apr  6 04:19:33 DOMAIN kernel: ipfw: limit 5 reached on entry 930
Apr  6 04:19:33 DOMAIN kernel: ipfw: limit 5 reached on entry 940

***dmesg***

Apr  6 04:52:01 DOMAIN kernel: ipfw: limit 5 reached on entry 910

Apr  6 04:52:54 DOMAIN fsck: /dev/da0s1e: 39 files, 145 used, 253670 free (30 frags, 31705 blocks, 0.0% fragmentation)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: LINK COUNT FILE I=70669  OWNER=operator MODE=100400
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: SIZE=2048 MTIME=Apr  6 02:22 2012  COUNT 2 SHOULD BE 1 (ADJUSTED)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: LINK COUNT FILE I=70680  OWNER=operator MODE=100400
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: SIZE=2048 MTIME=Apr  6 04:00 2012  COUNT 2 SHOULD BE 1 (ADJUSTED)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: LINK COUNT FILE I=70688  OWNER=operator MODE=100400
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: SIZE=2048 MTIME=Apr  6 04:22 2012  COUNT 2 SHOULD BE 1 (ADJUSTED)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: LINK COUNT FILE I=70689  OWNER=operator MODE=100400
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: SIZE=2048 MTIME=Apr  6 03:55 2012  COUNT 2 SHOULD BE 1 (ADJUSTED)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: LINK COUNT FILE I=70692  OWNER=operator MODE=100400
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: SIZE=2048 MTIME=Apr  6 04:11 2012  COUNT 2 SHOULD BE 1 (ADJUSTED)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: LINK COUNT FILE I=70694  OWNER=operator MODE=100400
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: SIZE=2048 MTIME=Apr  6 04:33 2012  COUNT 2 SHOULD BE 1 (ADJUSTED)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: LINK COUNT FILE I=70705  OWNER=operator MODE=100400
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: SIZE=2048 MTIME=Apr  6 03:44 2012  COUNT 2 SHOULD BE 1 (ADJUSTED)
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: Reclaimed: 0 directories, 1 files, 1 fragments
Apr  6 04:53:05 DOMAIN fsck: /dev/da0s1d: 25334 files, 126338 used, 623684 free (7228 frags, 77057 blocks, 1.0% fragmentation)

***dmesg***

gkontos · Apr 6, 2012

I don't think it is a memory related issue. It looks more like a filesystem corruption to me.
I would suggest a full backup of your data, sites and dbs, and then a fsck from single user mode.

Also, try fixing sendmail in your database jail by either disabling it or making the proper aliases.
I don't have much experience with IPFW syntax but I would find a way to keep only 1 rule there also.

ghostcorps · Apr 6, 2012

blakjak said:
you need to have a SWAP partition during your installation of the freeBSD OS.This
SWAP partition is used when your computer is running out of RAM space. I hope you have a swap partition?

Thanks Blakjak, yes, I do

#df -h

Code:

Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/da0s1a    496M    331M    125M    72%    /
devfs          1.0K    1.0K      0B   100%    /dev
/dev/da0s1e    496M    290K    456M     0%    /tmp
/dev/da0s1f     16G     12G    2.6G    82%    /usr
/dev/da0s1d    1.4G    247M    1.1G    18%    /var
devfs          1.0K    1.0K      0B   100%    /usr/gaols/webserver/dev
procfs         4.0K    4.0K      0B   100%    /usr/gaols/webserver/proc
devfs          1.0K    1.0K      0B   100%    /usr/gaols/database/dev
procfs         4.0K    4.0K      0B   100%    /usr/gaols/database/proc

#pstat -T

Code:

324/12072 files
0M/987M swap space

gkontos:

It looks like one of the guys at the host has run an fsck on it already, but I will check with them and give it a go otherwise. We have gone a night without a crash but who knows what the new day will bring.

I don't need sendmail on the db so I'll turn that off too. Thanks for pointing that out, I didn't even realise it was on.

Not sure what you mean about keeping 1 rule in ipfw. The rules above are only a small handful of the hundred or so rules I use to keep the ports blocked. If I remove any of them it will expose me.

Thanks again for you help

gkontos · Apr 6, 2012

ghostcorps said:
Not sure what you mean about keeping 1 rule in ipfw. The rules above are only a small handful of the hundred or so rules I use to keep the ports blocked. If I remove any of them it will expose me.

When dealing with firewall rules, you try to write them in such way that they don't bring extra burden in to the filtering engine.
Like I said before, I have absolutely no idea how script IPFW rules. But you can use this as a general rule of thumb:

1) Have your most frequent rules processed first.
2) Explicitly deny all other ports using a more general statement.

Pseudocode example:

Code:

permit any to <webeserver> <webservice_tcp_ports>
deny any  any

This pretty much works with any type of firewall.

ghostcorps · Apr 7, 2012

gkontos said:
When dealing with firewall rules, you try to write them in such way that they don't bring extra burden in to the filtering engine.

1) Have your most frequent rules processed first.
2) Explicitly deny all other ports using a more general statement.

Thanks,

I have made the rule list loosely adhering to that idea. I will see what I can do to optimise it.

olav · Apr 7, 2012

I had the same problem with FreeBSD 8.2-RELEASE, an upgrade to FreeBSD 8.2-STABLE solved it. The STABLE branch is a good branch, and can be used on production systems.

ghostcorps · Apr 7, 2012

Thanks olay,

I should have mentioned that this box is running FreeBSD 8.1-RELEASE-p2. There are a few patches outstanding because I have modified the kernel and rolling it back will take the site offline for a day and we have just gotten some articles out in the news so we don't want to take it down just yet.

So far it hasn't crashed again though. * fingers crossed*

debguy · Apr 12, 2012

sshd is 3mb? Talk about using ash not sh to reserve memory and ssh blows through it

112M is wrong. Apache would use 4MB for a process having a small web page open.

(1) I would check httpd config files to see what setting would *allow* apache to cache that much data: apache is designed not to break memory limits by any kind of web hits.

(2) Llook at your web content. Is apache loading a corrupt webpage that is in fact 100M to load?

Please say if the top you show is a httpd process waiting for a web hit or already having loaded the home page. (i.e., in the setting it may load 5 waiting - which get recycled)

Use netstat -a to see what's LISTENING v. CONNECTED.

BTW is that multi-processor / threaded apache processes or regular ones? The mp version I think may be a little wild on memory it may say so in the docs. Use the right apache2 install pkg.

By what is allowed I mean "apache mods" that your config say apache should / can load - there are so many I don't know if you're loading all of perl, python, php, and all else and the kitchen sink per process for no reason.

If you are migrating, you might not run the new apache with an old website - maybe use the apache the website had been working fine with.

ghostcorps · Apr 13, 2012

Thanks Debguy.

Did you mean bash instead of sh?

I will look into everything you have mentioned, there looks to be some fine-tuning to be done. I should say that top was run on the host which holds two jailed servers. Both the host and one of the virtual servers hosts an Apache installation. The webserver is a video streaming server. I would expect that Apache would run pretty heavy in this situation but I will still see what I can so about lightening the load.

I am not sure which modules are safe to disable and which are not, whenever I try to thin them out I always end up breaking something that is not obvious.

It looks to be the multiprocess version, which is the version portmaster chose to install.

/usr/ports/www/apache22/Makefile

Code:

PORTNAME=       apache
PORTVERSION=    2.2.22
PORTREVISION=   5
CATEGORIES=     www
MASTER_SITES=   ${MASTER_SITE_APACHE_HTTPD}
DISTNAME=       httpd-${PORTVERSION}
DIST_SUBDIR=    apache22

MAINTAINER?=    apache@FreeBSD.org
COMMENT?=       Version 2.2.x of Apache web server with ${WITH_MPM:L} MPM.

netstat -a

Code:

Active Internet connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address       (state)
tcp4       0    104 XXX.ssh              ME.35008       ESTABLISHED
tcp4       0      0 *.*                    *.*                    CLOSED
tcp46      0      0 *.http                 *.*                    LISTEN
tcp4       0      0 *.https                *.*                    LISTEN
tcp4       0      0 *.http                 *.*                    LISTEN
tcp4       0      0 *.8080                 *.*                    LISTEN
tcp4       0      0 SITENAME.com..smtp  *.*                    LISTEN
tcp4       0      0 *.ftp                  *.*                    LISTEN
tcp4       0      0 *.submission           *.*                    LISTEN
tcp6       0      0 *.smtp                 *.*                    LISTEN
tcp4       0      0 *.smtp                 *.*                    LISTEN
tcp4       0      0 XXX.ssh              *.*                    LISTEN
tcp4       0      0 XXX.ssh              *.*                    LISTEN

Thankfully we have not had any trouble since posting this thread, but that doesn't mean it can not happen again.

User23 · Apr 13, 2012

If php is used as apache module, 112MB per process is nothing special. Running low on RAM could happen if too many processes are running at the same time. Monitor your services and count of processes and you may find the problem easily.

I had similiar problems with a wordpress + statistics plugin. The plugin stored the statistics in a mysql db so slow, that the whole server could process only 2 request per second ... so the number of running processes raised sometimes to 200 or more and the server began to swap.

Use apachebench (ab) for a stress test.

ghostcorps · Apr 13, 2012

I am told we are using a statistics plugin on wordpress. But I ran ab and it held up fine I think.

ab -n 1000 -c 5 [url]https://URL.com/[/url]

Code:

Server Software:        Apache
Server Hostname:        URL.com.au
Server Port:            443
SSL/TLS Protocol:       TLSv1/SSLv3,DHE-RSA-AES256-SHA,2048,256

Document Path:          /
Document Length:        7756 bytes

Concurrency Level:      5
Time taken for tests:   848.079 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      8188432 bytes
HTML transferred:       7756000 bytes
Requests per second:    1.18 [#/sec] (mean)
Time per request:       4240.396 [ms] (mean)
Time per request:       848.079 [ms] (mean, across all concurrent requests)
Transfer rate:          9.43 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:     1194 1278 235.6   1203    3841
Processing:   803 2950 1326.2   2775   14652
Waiting:      802 2593 1299.9   2402   14282
Total:       2004 4227 1337.0   4042   16455

Percentage of the requests served within a certain time (ms)
  50%   4042
  66%   4337
  75%   4558
  80%   4722
  90%   5376
  95%   6395
  98%   7668
  99%  10180
 100%  16455 (longest request)

nb. That I ran ab from Australia and the server is in the US.

Looking at the info.php it says I am using mod_php5, which if I have googled correctly means that I am not using PHP as a module, is this right?

User23 · Apr 13, 2012

The main problem is not PHP but the number of processes, running at the same time, due to the slow statistic plugin.
Make a stress test with and without the wordpress statistics plugin enabled to verify the problem.

PHP as apache module should be the fastest option, so stay with it.

User23 · Apr 13, 2012

ghostcorps said:
I am told we are using a statistics plugin on wordpress.

Looking at the info.php it says I am using mod_php5, which if I have googled correctly means that I am not using PHP as a module, is this right?

"mod_php5" is the PHP5 apache module. So, everything is ok.

ghostcorps · Apr 13, 2012

User23 said:
The main problem is not PHP but the number of processes, running at the same time, due to the slow statistic plugin.
Make a stress test with and without the wordpress statistics plugin enabled to verify the problem.

PHP as apache module should be the fastest option, so stay with it.

I have been running a 50000 pass test over night, it is almost done. I'll try without the plugin when it finishes however during the test I see no more than 14 threads at any time. Is this good or bad?

ghostcorps · Apr 14, 2012

Could it have something to do with the time being out of sync by a day between the webserver and the database? I found my servers were not using the same timezone for some reason. But I have corrected it now.

Will see if this has any affect on the number of threads.

ghostcorps · Apr 15, 2012

Hello again,

By taking some time to research the mods and testing each one one at a time, I have cut the httpd threads down to about 95-120mb. Disabling the stats plugin WassUp did not reduce the thread size noticeably.

This is an improvement of about 20mb and all services I can think of are working but I get the feeling I could go further. Would you mind having a look at my mod list below and letting me know if I have blocked anything subtly crucial? Or If there is anything more I could block

/usr/local/etc/apache22/httpd.conf

Code:

LoadModule authn_file_module libexec/apache22/mod_authn_file.so
#LoadModule authn_dbm_module libexec/apache22/mod_authn_dbm.so
#LoadModule authn_anon_module libexec/apache22/mod_authn_anon.so
#LoadModule authn_default_module libexec/apache22/mod_authn_default.so
#LoadModule authn_alias_module libexec/apache22/mod_authn_alias.so
LoadModule authz_host_module libexec/apache22/mod_authz_host.so
LoadModule authz_groupfile_module libexec/apache22/mod_authz_groupfile.so
LoadModule authz_user_module libexec/apache22/mod_authz_user.so
#LoadModule authz_dbm_module libexec/apache22/mod_authz_dbm.so
#LoadModule authz_owner_module libexec/apache22/mod_authz_owner.so
#LoadModule authz_default_module libexec/apache22/mod_authz_default.so
LoadModule auth_basic_module libexec/apache22/mod_auth_basic.so
#LoadModule auth_digest_module libexec/apache22/mod_auth_digest.so
#LoadModule file_cache_module libexec/apache22/mod_file_cache.so
#LoadModule cache_module libexec/apache22/mod_cache.so
#LoadModule disk_cache_module libexec/apache22/mod_disk_cache.so
#LoadModule dumpio_module libexec/apache22/mod_dumpio.so
LoadModule reqtimeout_module libexec/apache22/mod_reqtimeout.so
LoadModule include_module libexec/apache22/mod_include.so
#LoadModule filter_module libexec/apache22/mod_filter.so
#LoadModule charset_lite_module libexec/apache22/mod_charset_lite.so
LoadModule deflate_module libexec/apache22/mod_deflate.so
LoadModule log_config_module libexec/apache22/mod_log_config.so
#LoadModule log_forensic_module libexec/apache22/mod_log_forensic.so
#LoadModule logio_module libexec/apache22/mod_logio.so
LoadModule env_module libexec/apache22/mod_env.so
#LoadModule mime_magic_module libexec/apache22/mod_mime_magic.so
#LoadModule cern_meta_module libexec/apache22/mod_cern_meta.so
LoadModule expires_module libexec/apache22/mod_expires.so
LoadModule headers_module libexec/apache22/mod_headers.so
LoadModule usertrack_module libexec/apache22/mod_usertrack.so
LoadModule unique_id_module libexec/apache22/mod_unique_id.so
LoadModule setenvif_module libexec/apache22/mod_setenvif.so
#LoadModule version_module libexec/apache22/mod_version.so
LoadModule ssl_module libexec/apache22/mod_ssl.so
LoadModule mime_module libexec/apache22/mod_mime.so
#LoadModule dav_module libexec/apache22/mod_dav.so
#LoadModule status_module libexec/apache22/mod_status.so
#LoadModule autoindex_module libexec/apache22/mod_autoindex.so
#LoadModule asis_module libexec/apache22/mod_asis.so
#LoadModule info_module libexec/apache22/mod_info.so
LoadModule cgi_module libexec/apache22/mod_cgi.so
#LoadModule dav_fs_module libexec/apache22/mod_dav_fs.so
LoadModule vhost_alias_module libexec/apache22/mod_vhost_alias.so
#LoadModule negotiation_module libexec/apache22/mod_negotiation.so
LoadModule dir_module libexec/apache22/mod_dir.so
#LoadModule imagemap_module libexec/apache22/mod_imagemap.so
LoadModule actions_module libexec/apache22/mod_actions.so
LoadModule speling_module libexec/apache22/mod_speling.so
#LoadModule userdir_module libexec/apache22/mod_userdir.so
LoadModule alias_module libexec/apache22/mod_alias.so
LoadModule rewrite_module libexec/apache22/mod_rewrite.so
LoadModule unique_id_module libexec/apache22/mod_unique_id.so
LoadModule security2_module libexec/apache22/mod_security2.so
LoadModule php5_module        libexec/apache22/libphp5.so

ghostcorps · Apr 15, 2012

Just when I thought I was ready to mark this as solved... It crashed again!!

I have worked through all the suggestions and still crashing!

I am lost now ...

User23 · Apr 16, 2012

ghostcorps said:
I have been running a 50000 pass test over night, it is almost done. I'll try without the plugin when it finishes however during the test I see no more than 14 threads at any time. Is this good or bad?

Depends on how many simultaneous queries you used to test and on the server hardware.

Try

Code:

ab -c 20 -n 1000 http://yourdomain.tld

for example.

ghostcorps · Apr 16, 2012

Thanks,

User23 said:
Try

Code:

ab -c 20 -n 1000 http://yourdomain.tld

for example.

I ran the test but it timed out after 4 completed requests.

Code:

Benchmarking URL.com (be patient)
apr_poll: The timeout specified has expired (70007)
Total of 4 requests completed

I couldn't browse to the site and my ssh session crashed out too, but I was able to log in with some patience. I found alot of threads were still open, I guess they were the threads created by ab had not closed..

After restarting apache the website came back up and access is normal again. Is it the RAM or is more likely that the interface between the site and SQL is too slow? Apache runs on one jailed server and the database is on another.

ghostcorps · Apr 17, 2012

It is strange, if I run a 20 thread test with a concurrency of 20, it pulls through and the threads clear. But when I run a 40 thread test with the same concurrency the test times out and the threads lock up.

I have added this to /usr/local/etc/apache22/httpd.conf

Code:

RequestReadTimeout header=1-3,MinRate=500

But it has not had any noticeable effect.

ghostcorps · Apr 17, 2012

By turning off the KeepAlive entry in the config I no longer lock up the server when ab times out. Which is excellent news.

But I still need to work out how to stop the server grinding to a halt when I run 20 consecutive threads for example:

ab -c 20 -n 20 [url]https://some.site.cd/[/url]

User23 · Apr 17, 2012

ghostcorps said:
Is it the RAM or is more likely that the interface between the site and SQL is too slow? Apache runs on one jailed server and the database is on another.

You could run the ab test on a static html page. This will show how the apache perform without mysql.

As I said, I guess it is the Wordpress statistic plugin. Keep an eye on the mysql slow queries log and use

Code:

show full processlist;

on the mysql console, while stress testing. If the statistics inserts are the bottleneck it should be easy to identify them in the processlist.