php-fpm (5.4.13) takes up all CPU resources

Hi All,



I have a strange problem with php-fpm, I checked lots of documents, but I can't fix it. I'm using a FreeBSD 9.1 server to host my website. Nginx and php-fpm 5.4.13 and MySQL being used.

The hardware of my server is 12Core CPUs, 32G memory

6 HDD's used as a raidz2 array to host the website.
2 SSD's used as a mirror array to host database.

After I upgraded from 5.4.11 to 5.4.13, I got below problem. Usually, my system is in below status: (top)

Code:
last pid: 52977;  load averages:  4.70,  4.66,  5.55                                                                    
301 processes: 2 running, 299 sleeping
 CPU:  6.5% user,  0.0% nice,  6.3% system,  0.9% interrupt, 86.3% idle
Mem: 7029M Active, 11G Inact, 12G Wired, 1076M Cache, 399M Free
ARC: 8412M Total, 3190M MFU, 1763M MRU, 624K Anon, 209M Header, 3249M Other
Swap: 36G Total, 145M Used, 36G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
52728 www           1  25    0   150M 35564K accept 13   0:01  5.86% php-fpm
52735 www           1  25    0   150M 35416K accept 16   0:01  5.76% php-fpm
52740 www           1  25    0   150M 37668K accept 14   0:01  5.66% php-fpm
52739 www           1  24    0   150M 36424K accept 12   0:01  5.66% php-fpm
52708 www           1  24    0   150M 36324K accept 10   0:01  5.66% php-fpm
52724 www           1  24    0   150M 37740K select 16   0:01  5.57% php-fpm
52714 www           1  25    0   150M 36952K accept 21   0:01  5.57% php-fpm
52719 www           1  25    0   150M 37252K accept 23   0:01  5.57% php-fpm
52717 www           1  25    0   150M 39004K accept 12   0:01  5.57% php-fpm
52734 www           1  25    0   150M 37720K accept 18   0:01  5.57% php-fpm
52726 www           1  24    0   150M 37308K accept 23   0:01  5.47% php-fpm
52737 www           1  25    0   150M 37832K accept 14   0:01  5.37% php-fpm
52721 www           1  25    0   150M 36548K accept 22   0:01  5.37% php-fpm
52710 www           1  25    0   154M 38792K accept 16   0:01  5.37% php-fpm
52723 www           1  25    0   150M 35768K accept 10   0:01  5.37% php-fpm

But sometimes, I found that a php-fpm will take up about 25%+ CPU resource, then the CPU usage of php-fpm will get increased one by one. Then php-fpm will take up all CPU resources.

Like below:
Code:
last pid: 54212;  load averages:  [B]75.83,  75.55,  75.73[/B]                                                                                 up 3+10:47:39  10:27:53
301 processes: 6 running, 295 sleeping
 CPU:  6.8% user,  0.0% nice,  95.2% system,  0.7% interrupt, [B]0.0% idle[/B]
Mem: 7021M Active, 11G Inact, 12G Wired, 1358M Cache, 561M Free
ARC: 8525M Total, 3438M MFU, 1619M MRU, 1866K Anon, 213M Header, 3253M Other
 Swap: 36G Total, 145M Used, 36G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
53939 www           1  31    0   150M 33096K accept 22   0:01  28.59% php-fpm
53941 www           1  31    0   154M 38144K accept 20   0:01  28.40% php-fpm
53938 www           1  30    0   150M 32816K accept  3   0:01  27.57% php-fpm
53677 www           1  52    0   150M 28440K accept  2   0:04  26.49% php-fpm
53948 www           1  27    0   150M 28964K accept 19   0:01  24.69% php-fpm
53947 www           1  26    0   150M 29272K accept  8   0:01  24.05% php-fpm
53951 www           1  25    0   150M 31208K accept 17   0:01  23.86% php-fpm
53953 www           1  24    0   150M 35300K accept 15   0:00  23.86% php-fpm
53950 www           1  25    0   150M 28268K accept  2   0:00  23.76% php-fpm
53954 www           1  24    0   150M 35476K accept 19   0:00  23.76% php-fpm
53955 www           1  24    0   150M 31292K accept 21   0:00  23.76% php-fpm
53956 www           1  24    0   150M 34064K accept 19   0:00  23.66% php-fpm
53962 www           1  24    0   150M 33224K CPU9    9   0:00  23.66% php-fpm
53960 www           1  24    0   150M 33548K CPU17  17   0:00  23.66% php-fpm

Below is the php-fpm.slow.
Code:
[01-Apr-2013 16:46:52] WARNING: [pool www] child 96623, script '/web/www/forum.php' (request: "GET /forum.php") executing too slow (30.661218 sec), logging 
 [01-Apr-2013 16:46:52] WARNING: [pool www] child 96620, script '/web/www/forum.php' (request: "GET /forum.php") executing too slow (31.778921 sec), logging 
 [01-Apr-2013 16:46:52] NOTICE: child 96764 stopped for tracing 
 [01-Apr-2013 16:46:52] NOTICE: about to trace 96764 
 [01-Apr-2013 16:46:52] NOTICE: finished trace of 96764 
 [01-Apr-2013 16:46:52] NOTICE: child 96763 stopped for tracing 
[B] [01-Apr-2013 16:46:52] NOTICE: about to trace 96763 
 [01-Apr-2013 16:46:52] ERROR: failed to ptrace(PT_IO) pid 96763: Bad address (14) [/B]
 [01-Apr-2013 16:46:52] NOTICE: finished trace of 96763 
 [01-Apr-2013 16:46:52] NOTICE: child 96762 stopped for tracing 
 [01-Apr-2013 16:46:52] NOTICE: about to trace 96762 
 [01-Apr-2013 16:46:52] NOTICE: finished trace of 96762 
 [01-Apr-2013 16:46:52] NOTICE: child 96761 stopped for tracing 
 [01-Apr-2013 16:46:52] NOTICE: about to trace 96761

The only way to fix this issue is to restart php-fpm. Below is the connection information about Nginx
Code:
Active connections: 2449 
server accepts handled requests
 36733039 36733039 147670906 
Reading: 86 Writing: 86 Waiting: 2277

At first, I though it's a MySQL error, but I monitored the MySQL process, and I found that there is no slowlog. I tried to reinstall PHP, but this issue still exists. Can anyone help on this issue?
 
It hung again just now.

Code:
last pid: 60485;  load averages: 153.26, 60.20, 27.69                                                                                up 3+11:28:52  11:09:06
307 processes: 243 running, 50 sleeping, 14 lock
CPU:  9.3% user,  0.0% nice, 89.1% system,  1.4% interrupt,  0.2% idle
Mem: 8327M Active, 11G Inact, 10G Wired, 972M Cache, 365M Free
ARC: 6768M Total, 3057M MFU, 430M MRU, 940K Anon, 166M Header, 3113M Other
Swap: 36G Total, 152M Used, 36G Free, 1828K Out

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
60407 www           1  75    0   150M 37944K RUN     9   0:07 10.69% php-fpm
60278 www           1  75    0   150M 39872K RUN     7   0:09 10.50% php-fpm
60403 www           1  75    0   150M 35944K CPU14   4   0:07 10.16% php-fpm
60399 www           1  75    0   150M 37192K CPU5    6   0:07 10.06% php-fpm
60402 www           1  75    0   150M 34336K RUN     9   0:07  9.96% php-fpm
60396 www           1  75    0   154M 37784K RUN    14   0:07  9.96% php-fpm
60285 www           1  75    0   150M 38236K RUN    18   0:08  9.86% php-fpm
60304 root          1  75    0 14036K  4288K RUN     2   0:23  9.77% find
60404 www           1  75    0   150M 37472K RUN    14   0:07  9.67% php-fpm
60347 www           1  75    0   150M 35896K CPU18  18   0:07  9.57% php-fpm
60365 www           1  75    0   150M 36488K RUN    11   0:08  9.47% php-fpm
60397 www           1  52    0   150M 36988K RUN     6   0:07  9.47% php-fpm
60405 www           1  52    0   150M 36196K RUN     7   0:07  9.38% php-fpm
60398 www           1  52    0   150M 35072K RUN     1   0:07  9.18% php-fpm
60282 www           1  74    0   150M 38272K RUN     0   0:07  8.98% php-fpm
60406 www           1  52    0   150M 33772K *Name   1   0:07  8.98% php-fpm
60453 www           1  74    0   150M 29424K RUN    20   0:02  8.89% php-fpm
60225 www           1  52    0   154M 42484K RUN    12   0:09  8.79% php-fpm
60277 www           1  52    0   150M 38752K RUN    20   0:08  8.79% php-fpm
60276 www           1  52    0   150M 37208K RUN    13   0:08  8.79% php-fpm
60408 www           1  74    0   150M 34884K RUN    16   0:06  8.79% php-fpm
60272 www           1  52    0   150M 39844K RUN    15   0:08  8.69% php-fpm
60258 www           1  74    0   150M 34776K RUN    12   0:07  8.69% php-fpm
60388 www           1  52    0   150M 35232K RUN     6   0:06  8.69% php-fpm
60172 www           1  74    0   150M 43512K RUN     3   0:12  8.59% php-fpm
60265 www           1  52    0   150M 41216K RUN    19   0:10  8.59% php-fpm
60275 www           1  74    0   150M 38556K RUN    10   0:08  8.59% php-fpm
60297 www           1  74    0   154M 35920K RUN    23   0:07  8.59% php-fpm
60328 www           1  52    0   150M 36180K RUN     4   0:07  8.59% php-fpm
60361 www           1  52    0   150M 33980K RUN     9   0:07  8.59% php-fpm
60234 www           1  74    0   154M 43608K RUN    10   0:11  8.50% php-fpm
60140 www           1  52    0   150M 37944K RUN    20   0:08  8.50% php-fpm
60292 www           1  52    0   150M 39384K RUN    14   0:08  8.50% php-fpm
60320 www           1  52    0   150M 36640K RUN    16   0:07  8.50% php-fpm
60199 www           1  74    0   154M 37092K CPU15  15   0:07  8.50% php-fpm
60344 www           1  74    0   150M 37008K CPU2    2   0:07  8.50% php-fpm
60208 www           1  74    0   150M 40792K CPU20   9   0:09  8.40% php-fpm
 
At a first glance, this sounds like an infinite loop in /web/www/forum.php.

However, I notice there is a large amount of active php-fpm processes. Therefore, I think it might be helpful to know more about the overall setup. How is nginx and php-fpm configured? Is this a single-site server? Anything else which may be relevant?
 
Savagedlight said:
At a first glance, this sounds like an infinite loop in /web/www/forum.php.

However, I notice there is a large amount of active php-fpm processes. Therefore, I think it might be helpful to know more about the overall setup. How is nginx and php-fpm configured? Is this a single-site server? Anything else which may be relevant?

Hi, this is a single site server. Below is the configuration file of php-fpm.

Code:
pid = /tmp/php-fpm.pid
error_log = /log/php-fpm.log
log_level = notice
emergency_restart_threshold = 10
emergency_restart_interval = 1m
process_control_timeout = 0
listen = /tmp/nginx.socket
listen.allowed_clients = 127.0.0.1
pm = static
pm.max_requests = 100
slowlog = /log/php-fpm.slow
request_slowlog_timeout = 30s 
request_terminate_timeout = 180s
pm.max_children = 256

Below is the configuration for Nginx:

Code:
user	www;
worker_processes  18;
worker_cpu_affinity  000000000000000001000000 000000000000000010000000 000000000000000100000000 000000000000001000000000 000000000000010000000000 000000000000100000000000 000000000001000000000000 000000000010000000000000 000000000100000000000000 000000001000000000000000 000000010000000000000000 000000100000000000000000 000001000000000000000000 000010000000000000000000 000100000000000000000000 001000000000000000000000 010000000000000000000000 100000000000000000000000;
error_log  /log/error.log;
google_perftools_profiles /tmp/tcmalloc;
worker_rlimit_nofile 65535;
events 
{
	worker_connections 20480;
	multi_accept 	on;
}

http {
	include       mime.types;
	include blockips.conf;
	default_type  application/octet-stream;
	
	
	log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

       	server_tokens off;
	access_log /log/access.log  main;

  	client_body_buffer_size   8k;
 	client_header_buffer_size 16k;
	client_max_body_size 20m;
  	large_client_header_buffers 4 32k;

	client_body_timeout   10;
	client_header_timeout 10;
	keepalive_timeout     15;
	send_timeout          10;

	keepalive_requests 100000;

	sendfile        on;
	tcp_nopush     on;
	tcp_nodelay on;

	fastcgi_connect_timeout 100;
	fastcgi_send_timeout 100;
	fastcgi_read_timeout 100;
	fastcgi_buffer_size 64k;
	fastcgi_buffers 4 64k;
	fastcgi_busy_buffers_size 128k;
	fastcgi_temp_file_write_size 128k;
	fastcgi_intercept_errors on;

	gzip on;
	gzip_min_length  1k;
	gzip_buffers     16 8k;
	gzip_http_version 1.0;
	gzip_comp_level 9;
	gzip_types       text/plain application/x-javascript text/css application/xml;
	gzip_vary on;

	open_file_cache max=200000 inactive=60s;
	open_file_cache_valid 30s;
	open_file_cache_min_uses 2;
	open_file_cache_errors on;

	map $arg_mod $forum_limit {
		default  $binary_remote_addr;
		image  '';
	}

	limit_conn_zone  $forum_limit  zone=forum_conn:128m;
	limit_req_zone  $forum_limit  zone=forum_req:128m   rate=3r/s;

server {
	listen       80;
	server_name  www..myweb.com myweb.com;

	location / {
		root   /web/www;
		index  index.html index.htm index.php; 
		rewrite ^([^\.]*)/topic-(.+)\.html$ $1/portal.php?mod=topic&topic=$2 last;
		rewrite ^([^\.]*)/forum-(\w+)-([0-9]+)\.html$ $1/forum.php?mod=forumdisplay&fid=$2&page=$3 last;
		rewrite ^([^\.]*)/thread-([0-9]+)-([0-9]+)-([0-9]+)\.html$ $1/forum.php?mod=viewthread&tid=$2&extra=page%3D$4&page=$3 last;
		rewrite ^([^\.]*)/group-([0-9]+)-([0-9]+)\.html$ $1/forum.php?mod=group&fid=$2&page=$3 last;
		rewrite ^([^\.]*)/space-(username|uid)-(.+)\.html$ $1/home.php?mod=space&$2=$3 last;
		rewrite ^([^\.]*)/blog-([0-9]+)-([0-9]+)\.html$ $1/home.php?mod=space&uid=$2&do=blog&id=$3 last;
		rewrite ^([^\.]*)/(fid|tid)-([0-9]+)\.html$ $1/index.php?action=$2&value=$3 last;
		rewrite ^([^\.]*)/article-([0-9]+)-([0-9]+)\.html$ $1/portal.php?mod=view&aid=$2&page=$3 last;
		rewrite ^([^\.]*)/portal-view-aid-([0-9]+)\.html$ $1/portal.php?mod=view&aid=$2&page=$3 last;
			if (!-e $request_filename) {
				return 404;
			}


	location ~ \.php$ {
		root           /www;
		fastcgi_pass   unix:/tmp/nginx.socket;
		fastcgi_index  index.php;
		fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
		include        fastcgi_params;
		}

	}
 
The issue happened again, and I got some log about that.
Code:
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27641, script '/web/forum.php' (request: "GET /forum.php") executing too slow (31.150793 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27607, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.034651 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27606, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.042857 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27604, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.027134 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27603, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.361675 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27602, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.380385 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27601, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.747122 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27600, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.790746 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27599, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.350703 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27589, script '/web/forum.php' (request: "GET /forum.php") executing too slow (30.126477 sec), logging
[02-Apr-2013 22:08:43] WARNING: [pool www] child 27577, script '/web/forum.php' (request: "GET /forum.php") executing too slow (31.826374 sec), logging
[02-Apr-2013 22:08:43] NOTICE: child 27641 stopped for tracing
[02-Apr-2013 22:08:43] NOTICE: about to trace 27641
[02-Apr-2013 22:08:43] NOTICE: finished trace of 27641
[02-Apr-2013 22:08:43] NOTICE: child 27607 stopped for tracing
[02-Apr-2013 22:08:43] NOTICE: about to trace 27607
[02-Apr-2013 22:08:43] NOTICE: finished trace of 27607
[02-Apr-2013 22:08:43] NOTICE: child 27606 stopped for tracing
[02-Apr-2013 22:08:43] NOTICE: about to trace 27606
[02-Apr-2013 22:08:43] NOTICE: finished trace of 27606
[02-Apr-2013 22:08:43] NOTICE: child 27604 stopped for tracing
[02-Apr-2013 22:08:43] NOTICE: about to trace 27604
[02-Apr-2013 22:08:43] NOTICE: finished trace of 27604
[02-Apr-2013 22:08:43] NOTICE: child 27603 stopped for tracing
[02-Apr-2013 22:08:43] NOTICE: about to trace 27603
[02-Apr-2013 22:08:43] ERROR: failed to ptrace(PT_IO) pid 27603: Bad address (14)
[02-Apr-2013 22:08:43] NOTICE: finished trace of 27603
[02-Apr-2013 22:08:43] NOTICE: child 27602 stopped for tracing
[02-Apr-2013 22:08:43] NOTICE: about to trace 27602
[02-Apr-2013 22:08:43] ERROR: failed to ptrace(PT_IO) pid 27602: Bad address (14)

Slow log

Code:
[02-Apr-2013 22:08:43]  [pool www] pid 27641
script_filename = /web/forum.php
[0x0000000802447ee0] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447db0] query() /web/source/class/class_core.php:809
[0x0000000802447bd8] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447ae0] _execute() /web/source/class/class_core.php:971
[0x0000000802447310] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443b58] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x0000000802443388] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27607
script_filename = /web/forum.php
[0x0000000802448158] mysql_query() /web/source/class/class_core.php:823
[0x0000000802448028] query() /web/source/class/class_core.php:809
[0x0000000802447e50] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447d58] _execute() /web/source/class/class_core.php:971
[0x0000000802447588] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443dd0] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x0000000802443600] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27606
script_filename = /web/forum.php
[0x0000000802448348] mysql_query() /web/source/class/class_core.php:823
[0x0000000802448218] query() /web/source/class/class_core.php:809
[0x0000000802448040] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447f48] _execute() /web/source/class/class_core.php:971
[0x0000000802447778] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443fc0] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x00000008024437f0] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27604
script_filename = /web/forum.php
[0x0000000802447e10] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447ce0] query() /web/source/class/class_core.php:809
[0x0000000802447b08] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447a10] _execute() /web/source/class/class_core.php:971
[0x0000000802447240] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443a88] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x00000008024432b8] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27603
script_filename = /web/forum.php
[0x00000008024572f0] mysql_query() /web/source/class/class_core.php:823
[0x0000000802457118] query() /web/source/class/class_core.php:1011
[0x0000000802457008] _execute() /web/source/class/class_core.php:985
[0x0000000802443d20] query() /web/source/module/forum/forum_forumdisplay.php:564
[0x0000000802443558] +++ dump failed

[02-Apr-2013 22:08:43]  [pool www] pid 27602
script_filename = /web/forum.php
[0x0000000802457138] mysql_query() /web/source/class/class_core.php:823
[0x0000000802456f60] query() /web/source/class/class_core.php:1011
[0x0000000802456e50] _execute() /web/source/class/class_core.php:985
[0x0000000802443b68] query() /web/source/module/forum/forum_forumdisplay.php:557
[0x00000008024433a0] +++ dump failed

[02-Apr-2013 22:08:43]  [pool www] pid 27601
script_filename = /web/forum.php
[0x0000000802447fe8] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447eb8] query() /web/source/class/class_core.php:809
[0x0000000802447ce0] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447be8] _execute() /web/source/class/class_core.php:971
[0x0000000802447418] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443c60] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x0000000802443490] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27600
script_filename = /web/forum.php
[0x00000008024576f0] mysql_query() /web/source/class/class_core.php:823
[0x00000008024575c0] query() /web/source/class/class_core.php:813
[0x00000008024573e8] result_first() /web/source/class/class_core.php:1011
[0x00000008024572f0] _execute() /web/source/class/class_core.php:980
[0x0000000802444008] result_first() /web/source/module/forum/forum_forumdisplay.php:490
[0x0000000802443840] +++ dump failed

[02-Apr-2013 22:08:43]  [pool www] pid 27599
script_filename = /web/forum.php
[0x0000000802448078] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447f48] query() /web/source/class/class_core.php:809
[0x0000000802447d70] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447c78] _execute() /web/source/class/class_core.php:971
[0x00000008024474a8] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443cf0] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x0000000802443520] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27589
script_filename = /web/forum.php
[0x0000000802448120] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447ff0] query() /web/source/class/class_core.php:809
[0x0000000802447e18] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447d20] _execute() /web/source/class/class_core.php:971
[0x0000000802447550] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443d98] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x00000008024435c8] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27577
script_filename = /web/forum.php
[0x0000000802448110] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447fe0] query() /web/source/class/class_core.php:809
[0x0000000802447e08] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447d10] _execute() /web/source/class/class_core.php:971
[0x0000000802447540] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443d88] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x00000008024435b8] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27548
script_filename = /web/forum.php
[0x0000000802447f38] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447e08] query() /web/source/class/class_core.php:809
[0x0000000802447c30] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447b38] _execute() /web/source/class/class_core.php:971
[0x0000000802447368] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443bb0] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x00000008024433e0] loadforum() /web/forum.php:53

[02-Apr-2013 22:08:43]  [pool www] pid 27546
script_filename = /web/forum.php
[0x0000000802447c00] mysql_query() /web/source/class/class_core.php:823
[0x0000000802447ad0] query() /web/source/class/class_core.php:809
[0x00000008024478f8] fetch_first() /web/source/class/class_core.php:1011
[0x0000000802447800] _execute() /web/source/class/class_core.php:971
[0x0000000802447030] fetch_first() /web/source/function/function_forum.php:591
[0x0000000802443878] get_thread_by_tid() /web/source/function/function_forum.php:450
[0x00000008024430a8] loadforum() /web/forum.php:53
 
Code:
last pid: 35097;  load averages: 211.66, 120.79, 73.15                                                                                                                       up 3+22:56:50  22:37:04
315 processes: 150 running, 59 sleeping, 106 lock
CPU: 12.1% user,  0.0% nice, 86.5% system,  1.3% interrupt,  0.1% idle
Mem: 8400M Active, 12G Inact, 9612M Wired, 1244M Cache, 348M Free
ARC: 5879M Total, 2605M MFU, 365M MRU, 2768K Anon, 143M Header, 2764M Other
Swap: 36G Total, 79M Used, 36G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
34988 root          1  52    0 14040K  3788K RUN     1   0:07 25.39% tail
56667 mysql       514  20    0  1748M  1172M sbwait 11  23.9H 11.38% mysqld
34883 www           1  75    0   169M 32832K *vnode 19   0:04 11.28% php-fpm
34809 www           1  75    0   169M 37908K RUN    18   0:05 11.08% php-fpm
34404 www           1  75    0   173M 42328K RUN     1   0:12 10.99% php-fpm
34575 www           1  75    0   169M 39828K RUN    20   0:11 10.89% php-fpm
34836 www           1  52    0   169M 37344K *vnode  2   0:05 10.89% php-fpm
34845 www           1  52    0   169M 34964K RUN    18   0:05 10.89% php-fpm
34859 www           1  75    0   169M 33632K *vnode  0   0:04 10.79% php-fpm
34416 www           1  75    0   169M 39248K RUN     5   0:11 10.69% php-fpm
34776 www           1  75    0   169M 38440K RUN     2   0:07 10.69% php-fpm
34786 www           1  75    0   169M 39524K CPU14  15   0:06 10.69% php-fpm
34784 www           1  75    0   169M 38868K RUN    16   0:06 10.69% php-fpm
34799 www           1  75    0   173M 39548K RUN    22   0:06 10.69% php-fpm
34860 www           1  52    0   169M 33132K RUN    13   0:04 10.69% php-fpm
34903 www           1  52    0   169M 32152K RUN    17   0:04 10.69% php-fpm
34821 www           1  75    0   169M 35684K RUN    10   0:05 10.60% php-fpm
34846 www           1  75    0   169M 34900K *vnode  4   0:04 10.60% php-fpm
34856 www           1  75    0   169M 33700K CPU20  20   0:04 10.60% php-fpm
34858 www           1  75    0   169M 32968K *vnode  7   0:04 10.60% php-fpm
34428 www           1  75    0   169M 40768K RUN     7   0:11 10.50% php-fpm
34792 www           1  75    0   169M 38952K RUN     4   0:06 10.50% php-fpm
34793 www           1  75    0   169M 40240K RUN    18   0:06 10.50% php-fpm
34810 www           1  75    0   169M 36736K RUN     6   0:05 10.50% php-fpm
34827 www           1  75    0   169M 35344K RUN     5   0:05 10.50% php-fpm
34850 www           1  75    0   169M 36228K *vnode 19   0:04 10.50% php-fpm
34847 www           1  75    0   169M 35736K *vnode 17   0:04 10.50% php-fpm
34853 www           1  75    0   173M 34676K *vnode 23   0:04 10.50% php-fpm
34865 www           1  75    0   169M 33676K CPU3   11   0:04 10.50% php-fpm
34783 www           1  75    0   169M 37428K RUN     6   0:06 10.35% php-fpm
34820 www           1  52    0   169M 37500K RUN    21   0:05 10.35% php-fpm
34813 www           1  75    0   169M 37288K RUN    14   0:05 10.35% php-fpm
34835 www           1  75    0   169M 36024K *vnode  1   0:05 10.35% php-fpm
34857 www           1  75    0   169M 34896K RUN     4   0:04 10.35% php-fpm
34879 www           1  52    0   173M 35224K RUN    17   0:04 10.35% php-fpm
34874 www           1  75    0   169M 33124K *vnode 16   0:04 10.35% php-fpm
34887 www           1  75    0   169M 32156K *vnode 16   0:04 10.35% php-fpm
34906 www           1  75    0   169M 31708K RUN     4   0:04 10.35% php-fpm
34928 www           1  52    0   169M 33420K RUN     5   0:03 10.35% php-fpm
34797 www           1  75    0   169M 37136K RUN     5   0:06 10.25% php-fpm
34808 www           1  75    0   169M 36544K CPU19  18   0:05 10.25% php-fpm
34834 www           1  75    0   169M 35992K *vnode  6   0:05 10.25% php-fpm
 
It's quite strange, now CPU usage is almost zero. But it seems php-fpm are all dead, they didn't respond to any request.
Code:
last pid: 35340;  load averages: 29.08, 80.82, 65.1274                                                                                                                       up 3+22:59:27  22:39:41
309 processes: 1 running, 308 sleeping
CPU:  0.2% user,  0.0% nice,  0.9% system,  0.1% interrupt, 98.8% idle
Mem: 8863M Active, 12G Inact, 9471M Wired, 1194M Cache, 116M Free
ARC: 5731M Total, 2523M MFU, 306M MRU, 376K Anon, 141M Header, 2761M Other
Swap: 36G Total, 78M Used, 36G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
56667 mysql       514  20    0  1748M  1177M uwait  22  23.9H 13.09% mysqld
16192 nobody        6  52    0 70252K 29336K uwait   9  20:40  0.00% memcached
20515 www           1  20    0   308M 23112K kqread  7   0:58  0.00% nginx
20516 www           1  20    0   308M 22784K kqread  8   0:56  0.00% nginx
20521 www           1  20    0   308M 24708K kqread 13   0:55  0.00% nginx
20514 www           1  20    0   308M 24384K kqread  6   0:51  0.00% nginx
20518 www           1  20    0   308M 22552K kqread 10   0:50  0.00% nginx
20522 www           1  20    0   308M 21888K kqread 14   0:50  0.00% nginx
20520 www           1  20    0   308M 22568K kqread 12   0:49  0.00% nginx
20529 www           1  20    0   308M 22028K kqread 21   0:48  0.00% nginx
20525 www           1  20    0   312M 23108K kqread 17   0:48  0.00% nginx
20526 www           1  20    0   308M 21468K kqread 18   0:48  0.00% nginx
20531 www           1  20    0   308M 23560K kqread 23   0:47  0.00% nginx
20524 www           1  20    0   308M 22572K kqread 16   0:47  0.00% nginx
20519 www           1  20    0   308M 23064K kqread 11   0:46  0.00% nginx
20523 www           1  20    0   308M 22052K kqread 15   0:45  0.00% nginx
20528 www           1  20    0   308M 21096K kqread 20   0:45  0.00% nginx
20517 www           1  20    0   304M 20452K kqread  9   0:45  0.00% nginx
20527 www           1  20    0   304M 19828K kqread 19   0:43  0.00% nginx
20530 www           1  20    0   308M 21492K kqread 22   0:42  0.00% nginx
93731 root          1  20    0 16592K  2808K CPU5   21   0:29  0.00% top
  970 root          1  20    0 12080K  1408K select 21   0:23  0.00% syslogd
34830 www           1  20    0   169M 41936K select  8   0:11  0.00% php-fpm
34802 www           1  20    0   169M 42296K select  6   0:10  0.00% php-fpm
34903 www           1  20    0   169M 39116K select  8   0:10  0.00% php-fpm
34850 www           1  20    0   173M 43264K select  8   0:09  0.00% php-fpm
34811 www           1  20    0   169M 41620K select  5   0:09  0.00% php-fpm
34847 www           1  20    0   169M 40900K select  1   0:09  0.00% php-fpm
34800 www           1  20    0   169M 43036K select  6   0:09  0.00% php-fpm
34821 www           1  20    0   169M 40380K select  5   0:09  0.00% php-fpm
34890 www           1  20    0   169M 39560K select 10   0:09  0.00% php-fpm
34971 www           1  20    0   169M 36008K select  4   0:09  0.00% php-fpm
34879 www           1  20    0   173M 41340K select  5   0:09  0.00% php-fpm
34911 www           1  20    0   169M 37996K select  0   0:09  0.00% php-fpm
34857 www           1  20    0   173M 41544K select  4   0:08  0.00% php-fpm
34862 www           1  20    0   169M 39032K select  0   0:08  0.00% php-fpm
34926 www           1  20    0   169M 39256K select  6   0:08  0.00% php-fpm
34853 www           1  20    0   173M 38496K select 14   0:08  0.00% php-fpm
34886 www           1  20    0   169M 38288K select  0   0:08  0.00% php-fpm
34974 www           1  20    0   169M 39064K select  6   0:08  0.00% php-fpm
34908 www           1  20    0   169M 37932K select  4   0:08  0.00% php-fpm
34900 www           1  20    0   173M 40920K select  1   0:08  0.00% php-fpm
34994 www           1  20    0   169M 38904K select  1   0:08  0.00% php-fpm
 
All php-fpm are in active status
Code:
pool	www
process manager	static
start time	Tuesday, April 2, 2013 10:59:31 PM
start since	55s
accepted conn	11883
listen queue	0
max listen queue	0
listen queue len	0
idle processes	0
active processes	256
total processes	256
max active processes	256
max children reached	0
slow requests	0
 
This latest iteration seems to be a problem with the MySQL server. Why is it in the sbwait state (waiting for socket), and hogging so much CPU?

Is there a disk throughput problem here?
 
Savagedlight said:
This latest iteration seems to be a problem with the MySQL server. Why is it in the sbwait state (waiting for socket), and hogging so much CPU?

Is there a disk throughput problem here?

Thanks for your quick response. I don't think it's a MySQL problem, because once the issue happens, I can only restart php-fpm to fix the issue. Restarting MySQL will not help on this issue. And there is no slowlog of MySQL.

The strange thing is that, from today, sometimes, the CPU usage of php-fpm will drop to 0, and the server looks like it has no load at all, almost 99% idle. But php-fpm doesn't respond to any request. At this time, I can't check php-fpm (http://www.*.com/status) via the web. After I restart php-fpm, then everything goes well. But at a few minutes, the issue happened again and again.
 
Okay. I've seen similar weird lockups in VMWare guests before, due to errors with the event timers (now fixed). It's a shot in the dark, but your issue might be related to the same underlying cause. Can you try changing the event timer?

Find the current event timer: # sysctl kern.eventtimer.timer
Find your available event timers: # sysctl kern.eventtimer.choice
Change it to something other than what it currently is, if possible.
 
Savagedlight said:
Okay. I've seen similar weird lockups in VMWare guests before, due to errors with the event timers (now fixed). It's a shot in the dark, but your issue might be related to the same underlying cause. Can you try changing the event timer?

Find the current event timer: # sysctl kern.eventtimer.timer
Find your available event timers: # sysctl kern.eventtimer.choice
Change it to something other than what it currently is, if possible.

Hi, I'm using a physical server, not vmware server. Do I need to change it?
Here is my server result:
Code:
# sysctl kern.eventtimer.timer
kern.eventtimer.timer: LAPIC
# sysctl kern.eventtimer.choice
kern.eventtimer.choice: LAPIC(600) HPET(450) HPET1(440) HPET2(440) HPET3(440) HPET4(440) HPET5(440) HPET6(440) HPET7(440) i8254(100) RTC(0)
 
I think it might be worth a shot. Try HPET or i8254?

Essentially, the lockups you're reporting don't seem to make much sense right now; at least not to me. So it'll be a bit of stabbing in the dark.

I have a lot of comments to make on that nginx and php-fpm config as well, but likely none related to the issue at hand, so I'll save that for later if you're interested. :)
 
Savagedlight said:
I think it might be worth a shot. Try HPET or i8254?

Essentially, the lockups you're reporting don't seem to make much sense right now; at least not to me. So it'll be a bit of stabbing in the dark.

I have a lot of comments to make on that nginx and php-fpm config as well, but likely none related to the issue at hand, so I'll save that for later if you're interested. :)

Thanks for you help on this issue. Sorry I don't understand what kern.eventtimer.timer is used for, can you explain more? And is there anything more I can provide to help you to learn more about this issue?

And if you can give me any suggestion with my configuration of Nginx and php-fpm, that'll be better.
 
Quoting eventtimers(4):
Kernel uses several types of time-related devices, such as: real time
clocks, time counters and event timers. Real time clocks responsible for
tracking real world time, mostly when system is down. Time counters are
responsible for generation of monotonically increasing timestamps for
precise uptime tracking purposes, when system is running. Event timers
are responsible for generating interrupts at specified time or periodi-
cally, to run different time-based events
. This page is about the last.
There was also a thread elsewhere on these forums about how some event timers were broken on specific hardware, and caused extra high i/o load. As said, it's a shot in the dark, but an easy enough thing to test and change back if it doesn't help.

nginx config
The two settings "worker_processes 18;" and "worker_connections 20480;" means you've tuned nginx to handle 368640 simultanious connections (source). You might want to consider dialing this down.

Is there any reason you specified worker_cpu_affinity, instead of letting the OS decide which CPU each worker should run on at any given time?

Is there any reason you set keepalive_requests to 100000? The default of 100 should be more than enough, and not be as vulnerable to DoS as it'd be harder to hog connection slots.

php-fpm config
Why is php-fpm instructed to restart each process after 100 requests? (pm.max_requests = 100)

Any particular reason you are using the 'static' process manager? (pm = static)
You might want to look into configuring the 'dynamic' process manager, and making sure to leave enough slack (pm.max_spare_servers) to not constantly fork processes. See php-fpm manual for more information.
 
Savagedlight said:
Quoting eventtimers(4):

There was also a thread elsewhere on these forums about how some event timers were broken on specific hardware, and caused extra high i/o load. As said, it's a shot in the dark, but an easy enough thing to test and change back if it doesn't help.

nginx config
The two settings "worker_processes 18;" and "worker_connections 20480;" means you've tuned nginx to handle 368640 simultanious connections (source). You might want to consider dialing this down.

Is there any reason you specified worker_cpu_affinity, instead of letting the OS decide which CPU each worker should run on at any given time?

Is there any reason you set keepalive_requests to 100000? The default of 100 should be more than enough, and not be as vulnerable to DoS as it'd be harder to hog connection slots.

php-fpm config
Why is php-fpm instructed to restart each process after 100 requests? (pm.max_requests = 100)

Any particular reason you are using the 'static' process manager? (pm = static)
You might want to look into configuring the 'dynamic' process manager, and making sure to leave enough slack (pm.max_spare_servers) to not constantly fork processes. See php-fpm manual for more information.

Hi @Savagedlight,

The load of my server is pretty heavy. To specified worker_cpu_affinity, it can save some CPU time, because the is no need for Nginx to choose which core it wants to use. And some processes prefer to use the first CPU, for example, MySQL, Awstats. With specified CPU affinity, we can assign a CPU (for example, 8-24) for Nginx.

Code:
keepalive_requests 100000;
I'm thinking to make Nginx to use HTTP Long Polling

Use 'static' process in php-fpm can optimize the performance for php-fpm, especially for servers which have a heavy load.
 
Last edited by a moderator:
meteor8488 said:
Hi Savagedlight,

The load of my server is pretty heavy.
To specified worker_cpu_affinity, it can save some CPU time, because the is no need for nginx to choose which core it wants to use. And some of process they prefer to use the first CPU, for example, mysql,awstats. With specified CPU affinity, we can assign cpu(for example, 8-24) for nginx.


keepalive_requests 100000;
I'm thinking to make nginx to use HTTP Long Polling

Use 'static' procese in php-fpm can optimize the performance for php-fpm, especially for servers which has a heavy load.

Okay, that puts it into perspective. :)

Back on topic:
I've done a bit of digging around, and found this thread (OP) on the nginx mailing lists, which seems to be about the exact same symptoms you're having. As suggested by this reply, it's likely a slow SQL query.

meteor8488 said:
At first, I though it's a MySQL error, but I monitored the MySQL process, and I found that there is no slowlog. I tried to reinstall PHP, but this issue still exists. Can anyone help on this issue?
For some reason, I initially read this as there not being anything in the MySQL slow log.
So to pick up on that trail again, and to rule out the obvious: Have you verified that the MySQL slow log is enabled?

Just to be thorough:
The relevant configuration option in /usr/local/etc/my.cnf would be slow_query_log=1. You may optionally define a custom long_query_time=X where X is the number of seconds before logging a query, and slow_query_log_file to some path which is good at handling sequental writes. (This file might get big if it goes unchecked.)
If the configuration file is OK, please also verify that the MySQL server is running with the right configuration by executing a query similar to [CMD="sql:"]SHOW VARIABLES LIKE 'slow_query_%'[/CMD].

You might also want to be polling the MySQL server regularly to find long-running queries, or at the very least check this when/if the problem occurs again.
 
Savagedlight said:
Okay, that puts it into perspective. :)

Back on topic:
I've done a bit of digging around, and found this thread (OP) on the nginx mailing lists, which seems to be about the exact same symptoms you're having. As suggested by this reply, it's likely a slow SQL query.


For some reason, I initially read this as there not being anything in the MySQL slow log.
So to pick up on that trail again, and to rule out the obvious: Have you verified that the MySQL slow log is enabled?

Just to be thorough:
The relevant configuration option in /usr/local/etc/my.cnf would be slow_query_log=1. You may optionally define a custom long_query_time=X where X is the number of seconds before logging a query, and slow_query_log_file to some path which is good at handling sequental writes. (This file might get big if it goes unchecked.)
If the configuration file is OK, please also verify that the MySQL server is running with the right configuration by executing a query similar to [CMD="sql:"]SHOW VARIABLES LIKE 'slow_query_%'[/CMD].

You might also want to be polling the MySQL server regularly to find long-running queries, or at the very least check this when/if the problem occurs again.

Yes, I enabled the MySQL log. There are some slow logs, but I think they are not the cause of this problem. Because the slow 'query' can be finished in 3-5 seconds (I set 2 seconds for slowlog).
 
I enabled debug mode for php-fpm, below is the log when the issue happens:
Code:
[03-Apr-2013 22:08:40.456606] NOTICE: pid 16914, fpm_children_make(), line 421: [pool www] child 23756 started
[03-Apr-2013 22:08:40.456844] NOTICE: pid 16914, fpm_children_bury(), line 252: [pool www] child 23240 exited with code 0 after 183.832013 seconds from start
[03-Apr-2013 22:08:40.611575] NOTICE: pid 16914, fpm_children_make(), line 421: [pool www] child 23757 started
[03-Apr-2013 22:08:40.611840] NOTICE: pid 16914, fpm_children_bury(), line 252: [pool www] child 23143 exited with code 0 after 207.831151 seconds from start
[03-Apr-2013 22:08:40.772880] NOTICE: pid 16914, fpm_children_make(), line 421: [pool www] child 23758 started
[03-Apr-2013 22:08:40.838660] NOTICE: pid 16914, fpm_children_bury(), line 252: [pool www] child 23171 exited with code 0 after 201.707902 seconds from start
[03-Apr-2013 22:08:40.875021] NOTICE: pid 16914, fpm_children_make(), line 421: [pool www] child 23759 started
[03-Apr-2013 22:08:40.875244] DEBUG: pid 16914, fpm_got_signal(), line 72: received SIGCHLD
[03-Apr-2013 22:08:40.875336] DEBUG: pid 16914, fpm_got_signal(), line 72: received SIGCHLD
[03-Apr-2013 22:08:40.875425] DEBUG: pid 16914, fpm_got_signal(), line 72: received SIGCHLD
[03-Apr-2013 22:08:40.875509] DEBUG: pid 16914, fpm_got_signal(), line 72: received SIGCHLD
[03-Apr-2013 22:08:40.875615] DEBUG: pid 16914, fpm_event_loop(), line 411: event module triggered 5 events
[03-Apr-2013 22:08:41.096407] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::table() should not be called statically in /web/www/source/module/forum/forum_rss.php on line 78"
[03-Apr-2013 22:08:41.096475] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::_execute() should not be called statically in /web/www/source/class/class_core.php on line 906"
[03-Apr-2013 22:08:41.096506] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::object() should not be called statically in /web/www/source/class/class_core.php on line 1010"
[03-Apr-2013 22:08:41.096537] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::query() should not be called statically in /web/www/source/module/forum/forum_rss.php on line 78"
[03-Apr-2013 22:08:41.096566] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::checkquery() should not be called statically in /web/www/source/class/class_core.php on line 984"
[03-Apr-2013 22:08:41.096595] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::_do_query_safe() should not be called statically in /web/www/source/class/class_core.php on line 1028"
[03-Apr-2013 22:08:41.096629] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::_execute() should not be called statically in /web/www/source/class/class_core.php on line 985"
[03-Apr-2013 22:08:41.096662] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::object() should not be called statically in /web/www/source/class/class_core.php on line 1010"
[03-Apr-2013 22:08:41.096699] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::num_rows() should not be called statically in /web/www/source/module/forum/forum_rss.php on line 79"
[03-Apr-2013 22:08:41.096730] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::_execute() should not be called statically in /web/www/source/class/class_core.php on line 989"
[03-Apr-2013 22:08:41.096766] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method DB::object() should not be called statically in /web/www/source/class/class_core.php on line 1010"
[03-Apr-2013 22:08:41.096807] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_process::islocked() should not be called statically in /web/www/source/module/forum/forum_rss.php on line 116"
[03-Apr-2013 22:08:41.096839] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_process::_status() should not be called statically in /web/www/source/class/class_core.php on line 1247"
[03-Apr-2013 22:08:41.096870] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_process::_find() should not be called statically in /web/www/source/class/class_core.php on line 1250"
[03-Apr-2013 22:08:41.096894] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_process::_cmd() should not be called statically in /web/www/source/class/class_core.php on line 1272"
[03-Apr-2013 22:08:41.096919] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_core::instance() should not be called statically in /web/www/source/function/function_core.php on line 2133"
[03-Apr-2013 22:08:41.096952] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_process::_process_cmd_memory() should not be called statically in /web/www/source/class/class_core.php on line 1288"
[03-Apr-2013 22:08:41.096977] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_core::instance() should not be called statically in /web/www/source/function/function_core.php on line 2133"
[03-Apr-2013 22:08:41.097001] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23703 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Strict Standards:  Non-static method discuz_process::_status() should not be called statically in /web/www/source/class/class_core.php on line 1278"
[03-Apr-2013 22:08:41.097029] WARNING: pid 16914, fpm_stdio_child_said(), line 166: [pool www] child 23222 said into stderr: "NOTICE: sapi_cgi_log_message(), line 685: PHP message: PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) in /web/www/source/class/class_core.php on line 805"
h
 
And when the issue was happening, if I check mysql with show full processlist there will be hundreds of process:
Code:
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | NULL        | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL                                                                           
 | localhost | forums | Sleep   |    0 |                              | NULL
But I already add below lines into my.cnf:
Code:
wait_timeout = 10
connect_timeout = 10
interactive_timeout = 10
 
You are logging loads of processes, if your application is heavy then some php functions are always going to take huge resources, so, you can change the timeout for slowlog, as when php starts logging every slow processes, this is overkill for php-fpm and your server.

I had the same issue, I tried every possible thing in earth, before changing the log options of php-fpm. You might want to change these variables.
Code:
slowlog = slow.log
request_slowlog_timeout = 30s
request_terminate_timeout = 120s
rlimit_core = unlimited

# In production to restart child process after 100 process server is overkill too
# As some pages are really huge, and if you are using e-commerce, you cannot use cache in every page
# Hence, the process gets bigger, so, increase 

pm.max_requests = 1000
pm.process_idle_timeout = 10s;

Your config is set for high traffic, but to use just one socket for processing all the processes is again overkill. If you have high traffics, then you might want to use few more sockets for php-fpm and nginx (I guess you have fair load balance compiled while installing nginx), this way your processes gets in balanced. eg: make three copies of php-fpm pool connections, and make it to listen few sockets like:
Code:
[www1] 
listen = /tmp/php-fpm1.sock;
# Other Config here

[www2]
listen = /tmp/php-fpm2.sock;
# Other Config here

[www3]
listen = /tmp/php-fpm3.sock;
# Other Config here

#####################################
# This will not only help you in balancing the load for socket buffer
# But also acts as fail safe method when one of php-fpm processes dies
# I guess you already have emergency threshold method to restart the child processes when it dies 
####################################
and in nginx listen directive use all 3 sockets one by one like:
Code:
fair; # To use fair load balancer
server unix:/tmp/php-fpm1.sock;
server unix:/tmp/php-fpm2.sock;
server unix:/tmp/php-fpm3.sock;

Other thing you have to do is decrease the pm.max_children and distribute among the 3 pools nicely, this will save you lots of cpu processes and eat up little bit more of your memory.

Other best practice for high traffic site using nginx is to do
Code:
multi_accept off; # Saves your children going wild
accept_mutex on; # Gives little bit needed relaxation for server
accept_mutex_delay 500ms;
 
Hi @psyc, Thanks very much! It's quite useful.

Code:
Emergency threshold method to restart the child processes when it dies
The problem for me is that the php-fpm doesn't die. But very slow.

In normal status

Code:
last pid:  2482;  load averages:  4.53,  2.28,  0.95                                  up 0+00:02:04  09:22:10
174 processes: 6 running, 168 sleeping
CPU:  8.1% user,  0.0% nice,  5.6% system,  0.4% interrupt, 85.9% idle
Mem: 517M Active, 526M Inact, 2821M Wired, 27G Free
ARC: 2013M Total, 814M MFU, 1086M MRU, 819K Anon, 41M Header, 70M Other
Swap: 36G Total, 36G Free


45923 mysql       260  20    0   649M   605M uwait  11 901:51 28.03% mysqld
24015 www           1  32    0   168M 29220K accept 12   0:01  8.40% php-fpm
24021 www           1  26    0   168M 30648K accept 16   0:01  5.86% php-fpm
24037 www           1  26    0   168M 33484K sbwait 20   0:01  5.76% php-fpm
24033 www           1  25    0   168M 33156K accept 11   0:01  5.76% php-fpm
24019 www           1  26    0   172M 39044K zfs     9   0:01  5.76% php-fpm
24022 www           1  25    0   168M 34168K CPU6    6   0:01  5.66% php-fpm
24026 www           1  26    0   168M 32596K accept 22   0:01  5.66% php-fpm
24024 www           1  25    0   168M 34528K CPU9    6   0:01  5.47% php-fpm
24020 www           1  26    0   168M 33424K accept 22   0:01  5.37% php-fpm
20% php-fpm
24009 www           1  27    0   168M 34360K accept 15   0:01  3.96% php-fpm
24011 www           1  28    0   168M 31264K accept 13   0:01  3.86% php-fpm
23897 www           1  27    0   168M 31992K accept 11   0:01  3.76% php-fpm
24081 www           1  27    0   168M 32764K accept 20   0:01  3.76% php-fpm
24012 www           1  27    0   168M 33484K accept 10   0:01  3.56% php-fpm
96697 www           1  21    0   804M 71724K zio->i  6  11:56  1.76% nginx
96712 www           1  21    0   804M 72208K zio->i 21  11:10  1.66% nginx
96706 www           1  21    0   804M 71656K zio->i 15  10:55  1.66% nginx
96698 www           1  20    0   804M 71380K zio->i  7  12:17  1.56% nginx


When the issue happens
Code:
last pid: 24842;  load averages: 45.09, 18.08, 20.045                                                                                                                        up 1+12:49:40  22:09:50
180 processes: 123 running, 57 sleeping
CPU: 14.3% user,  0.0% nice, 83.8% system,  1.7% interrupt,  0.2% idle
Mem: 8180M Active, 16G Inact, 5672M Wired, 1035M Cache, 35M Free
ARC: 3008M Total, 684M MFU, 951M MRU, 13M Anon, 74M Header, 1286M Other
Swap: 36G Total, 5908K Used, 36G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
45923 mysql       260  20    0   649M   605M uwait  15 905:51 63.13% mysqld
24781 www           1  77    0   168M 35224K RUN     4   0:05 17.68% php-fpm
24810 www           1  77    0   172M 38768K RUN    22   0:05 16.99% php-fpm
24815 www           1  77    0   168M 37768K RUN    20   0:05 16.99% php-fpm
24713 www           1  52    0   168M 37556K RUN     7   0:05 16.16% php-fpm
24722 www           1  52    0   168M 37256K RUN    22   0:05 16.16% php-fpm
24727 www           1  52    0   168M 38228K RUN     7   0:05 16.16% php-fpm
24734 www           1  52    0   168M 37764K RUN    19   0:05 16.16% php-fpm
24718 www           1  52    0   168M 36580K sbwait 10   0:05 16.16% php-fpm
24818 www           1  52    0   168M 38156K RUN    12   0:05 16.16% php-fpm
24826 www           1  52    0   168M 35320K RUN     4   0:05 16.16% php-fpm

Is there any method I can restart php-fpm automatically?
 
Last edited by a moderator:
Back
Top