php-fpm worker segfaults: how to diagnose?

After replacing PHP 5.6 with 7.2.17, my nginx-invoked php-fpm worker processes sometimes crash at random, causing 502 Bad Gateway errors.

/var/log/php-fpm.log just says something like this each time:
Code:
WARNING: [pool www] child 39947 exited on signal 11 (SIGSEGV) after 0.000262 seconds from start

The number of seconds from start is always very low like that. When this happens, I can usually reload the page and everything's fine.

The scripts which cause problems are big and use PDO to access a MySQL database. I can't say for sure the PDO extensions are actually to blame, but small scripts which don't rely on database access are not causing any php-fpm crashes, as far as I can tell. It may just be some kind of low memory related issue (512 MB physical RAM on this system, a BBB, so there's some usage of swap).

I would like to try to diagnose further, but I am having trouble figuring out how to do so.

/usr/local/etc/php-fpm.conf is all default values, except rlimit_core = unlimited, but even after I set that, core dumps are still not being generated. (Also, if there are some better ways to find these core dumps than find / -type f -name '*.core', let me know.)

Any suggestions appreciated!
 
Each time a program crash, it generate a progname.core, if you want to diagnose the problem, the analysis of the core files is your best bet, the Handbook has the section Examining a Core File.

I have never analyzed a core file, I hope someone in the forum can help with that.

What names have the core files?

Also, check if MySQL is logging something
 
I would enable PHP error/warning logging (this is typically turned off on production systems) and check those. It may spew a bunch of error messages prior to the crash that might provide clues where to look further.

Analyzing a core dump presumes you have some debugging experience with C/C++ code and know your way around the source code. Otherwise you won't be able to make head nor tail from it.
 
To even think about analyzing a core dump, I have to get the core dump first, but I can't even do that.

PHP logging is turned on via these php.ini settings, but no messages are being generated:
Code:
display_errors = On
log_errors = On
error_log = /var/log/php_errors.log
fastcgi.logging = 1

As for MySQL, this went into the error log at the time of the php-fpm crash:
Code:
2019-04-10T16:28:25.002048Z 20 [Note] Aborted connection 20 to db: 'foodb' user: 'foo' host: 'localhost' (Got an error reading communication packets)
2019-04-10T16:28:25.004421Z 21 [Note] Aborted connection 21 to db: 'foodb' user: 'foo' host: 'localhost' (Got an error reading communication packets)

In my.cnf I also added general_log=1 but the resulting logfile did not catch any queries at the time of the php-fpm crash. The failure is apparently happening when trying to establish a connection to the database.
 
To see all error messages php.ini also needs to have this setting:
Code:
error_reporting = E_ALL
 
Are you using try/catch block? and aren't trapping the crash?

Try to change the ATTR_ERRMODE to ERRMODE_EXCEPTION

PHP:
try {
    $dsn = "mysql:host=localhost;dbname=$dbname";
    $dbh = new PDO($dsn, $user, $password, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
    // interact with database
} catch (PDOException $e){
    echo $e->getMessage();
}
 
Thanks, all. I'm still working on this.

PHP error logging
php.ini does already have error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT. The logging is actually working; I verified this with a bad script. But no PHP errors are logged at the time of the php-fpm crashes.

PHP config
Almost everything is set to the lang/php72 port's php.ini-production defaults. I only edited php.ini to enable the log settings mentioned previously, plus a few opcache settings. The crashes happen regardless of whether opcache is installed.

MySQL connection & query logging
I was looking in the wrong place for the MySQL query log; I had log-output=TABLE in my.cnf, and needed it to be log-output=FILE. (When it was TABLE, the log was going into a database table, although a file was still created somehow; it was just mostly empty).

Generating crashes with MediaWiki
I can get php-fpm to crash about 10–20% of the time by editing a MediaWiki article. MediaWiki runs MySQL queries via multiple socket connections—2 or 3 at a time, 11 connections total for a typical edit. The MYSQL log shows that when I get a crash, it happens consistently at the same place every time. It's actually not immediately upon connection, rather it's while running some queries.

Here is the last message in the log when there's a crash:
SQL:
SELECT /* LCStoreDB::get  */  lc_value  FROM `mw_l10n_cache`    WHERE lc_lang = 'en' AND lc_key = 'magicWords'  LIMIT 1

When there's no crash, the next message is this:
SQL:
SELECT /* WikiPage::pageData  */  page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model  FROM `mw_page`    WHERE page_namespace = '2' AND page_title = 'Mjb/FreeBSD_on_BeagleBone_Black/Additional_software'  LIMIT 1;

Both of these are benign queries which I can run from the mysql command-line client with no problems. The data returned by the queries is very short.

When there's no crash, no new connections are made for a while, and the only other open connection at the time these queries are running is idle. The next thing which that 2nd connection will do is this:
SQL:
SELECT /* SqlBagOStuff::getMulti  */  keyname,value,exptime  FROM `mw_objectcache`    WHERE keyname = 'OffsetDB-mw_:messages:en'

...which is again, nothing notable; I can run it myself without incident (other than the binary value data turning my console to gibberish).

So I am unsure what to make of this. It's crashing at random in a consistent spot, but nothing too demanding is happening at that particular moment, as far as I can tell.

Other ways to crash php-fpm?
I only have one other app using MySQL: Tiny Tiny RSS. Before I upgraded it to the latest version, it was crashing php-fpm consistently when I tried to do anything with it. But now that I am using the current port, all is well; it does not trigger the crash, so I can't use it to try to figure out what it might have in common with MediaWiki.

Martin's suggestion for a PHP test script was helpful. I made it open 2 connections and hit the same tables, but I am unable to generate any crashes with it:
PHP:
<?php
        $dbname = "foo";
        $dbuser = "foo";
        $dbpass = "foo";
        $pagename = 'Mjb/FreeBSD_on_BeagleBone_Black/Additional_software';
        echo '<h1>Testing...</h1>';
        try {
                $dsn = "mysql:host=localhost;dbname=$dbname";
                $dbh = new PDO($dsn, $dbuser, $dbpass, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
                $dbh2 = new PDO($dsn, $dbuser, $dbpass, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
                $query = "SELECT old_text FROM mw_text WHERE old_id IN (SELECT rev_text_id FROM mw_revision LEFT JOIN mw_page ON page_latest = rev_id WHERE page_title LIKE '" . $pagename . "');";
                $query2 = "SELECT /* WikiPage::pageData  */  page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,$
                $sth = $dbh->query($query);
                $result = $sth->fetch(PDO::FETCH_OBJ);
                $raw_content = $result->old_text;
                echo '<pre style="margin: 3em; padding: 1em; border: 1px solid black; background: #eee; color: #000; white-space: pre-wrap;">';
                echo str_replace('<','&lt;',str_replace('&','&amp;',$raw_content));
                echo '</pre>';
                $sth2 = $dbh2->query($query2);
                $result = $sth2->fetchAll();
                $raw_content = print_r($result, TRUE);
                echo '<pre style="margin: 3em; padding: 1em; border: 1px solid black; background: #eee; color: #000; white-space: pre-wrap;">';
                echo str_replace('<','&lt;',str_replace('&','&amp;',$raw_content));
                echo '</pre>';
                $sth = null;
                $dbh = null;
                $dbh2 = null;
        } catch (PDOException $e){
                echo $e->getMessage();
        }
?>

MySQL configuration
Because this is running on a BeagleBone Black, I have some resource limits in my.cnf to try to keep MySQL from hogging precious RAM. However, I'm pretty sure a crash (in php-fpm or anywhere else) should not be the expected outcome if some values are too low. Nevertheless, here's what I am using, and suggestions for further tuning appreciated:
Code:
# $FreeBSD: branches/2019Q1/databases/mysql57-server/files/my.cnf.sample.in 414707 2016-05-06 14:39:59Z riggs $

[client]
port                            = 3306
socket                          = /tmp/mysql.sock

[mysql]
prompt                          = \u@\h [\d]>\_
no_auto_rehash

[mysqld]
user                            = mysql
port                            = 3306
socket                          = /tmp/mysql.sock
bind-address                    = 127.0.0.1
basedir                         = /usr/local
datadir                         = /var/db/mysql
tmpdir                          = /var/db/mysql_tmpdir
slave-load-tmpdir               = /var/db/mysql_tmpdir
secure-file-priv                = /var/db/mysql_secure
log-bin                         = mysql-bin
#log-output                      = TABLE
log-output                      = FILE
master-info-repository          = TABLE
relay-log-info-repository       = TABLE
relay-log-recovery              = 1
slow-query-log                  = 1
server-id                       = 1
sync_binlog                     = 1
sync_relay_log                  = 1
#binlog_cache_size               = 16M
binlog_cache_size               = 4K
expire_logs_days                = 30
default_password_lifetime       = 0
enforce-gtid-consistency        = 1
gtid-mode                       = ON
safe-user-create                = 1
lower_case_table_names          = 1
explicit-defaults-for-timestamp = 1
myisam-recover-options          = BACKUP,FORCE
open_files_limit                = 32768
table_open_cache                = 16384
table_definition_cache          = 8192
net_retry_count                 = 16384
#key_buffer_size                 = 256M
key_buffer_size                 = 8M
max_allowed_packet              = 64M
query_cache_type                = 0
query_cache_size                = 0
long_query_time                 = 0.5
#innodb_buffer_pool_size         = 1G
innodb_buffer_pool_size         = 5M
innodb_data_home_dir            = /var/db/mysql
innodb_log_group_home_dir       = /var/db/mysql
#innodb_data_file_path           = ibdata1:128M:autoextend
innodb_data_file_path           = ibdata1:12M:autoextend
#innodb_temp_data_file_path      = ibtmp1:128M:autoextend
innodb_temp_data_file_path      = ibtmp1:12M:autoextend
innodb_flush_method             = O_DIRECT
innodb_log_file_size            = 256M
innodb_log_buffer_size          = 256K
innodb_write_io_threads         = 8
innodb_read_io_threads          = 8
innodb_autoinc_lock_mode        = 2
skip-symbolic-links

# all of the following added by mjb
max_connections=20
thread_cache_size=0
host_cache_size=0
innodb_ft_cache_size=1600000
innodb_ft_total_cache_size=32000000
thread_stack=256K
sort_buffer_size=32K
read_buffer_size=8200
read_rnd_buffer_size=8200
max_heap_table_size=16K
tmp_table_size=1K
bulk_insert_buffer_size=0
join_buffer_size=128
net_buffer_length=1K
innodb_sort_buffer_size=64K
binlog_stmt_cache_size=4K
general_log      = 1

[mysqldump]
max_allowed_packet              = 256M
quote_names
quick

(Variations from the my.cnf.sample defaults are all noted with comments.)

Next steps?
Could it be some kind of database corruption? But mysql_check says everything is OK.

Still wish I could figure out how to get php-fpm to dump core. Someone could then take a look at it for me.
 
...
PHP error logging
php.ini does already have error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT....
This is not the same thing as error_reporting = E_ALL. Your directive turns off deprecated and strict error messages, not on. That's what the ~ character does. Use the settings from php.ini-development instead.
 
If the database and web server is the same machine, use UNIX sockets (the socket must be in a physical disk o memory disk, not in USB / SD disk)

In the php-fpm pool conf file, you can set catch_workers_output = yes to redirect worker stdout and stderr into main error log.
 
Thanks, guys, for continuing to help.

I changed error_reporting to just E_ALL, but it did not make a difference. Nothing shows up in the PHP error log at the time of the crash.

If the database and web server is the same machine, use UNIX sockets (the socket must be in a physical disk o memory disk, not in USB / SD disk)

MySQL and nginx are on the same machine, yes. MySQL is configured with both a socket (/tmp/mysql.sock) and TCP port (localhost:3306). I am under the impression that only the socket is ever being used, because the general log's "Connect" lines always say "using Socket".

Physical internal disks are not an option on the BBB. I am using the Linux partition of the built-in eMMC drive for swap. FreeBSD and everything else is on a microSD drive. Since RAM is so precious (512 MB, not expandable), it would be nice not to have any RAM disks.

At the very least, since too many things assume a large /tmp is available, I do not want to use tmpfs for /tmp. But I am happy to set up a tiny tmpfs mounted elsewhere just for use by sockets. Is that advisable?

In the php-fpm pool conf file, you can set catch_workers_output = yes to redirect worker stdout and stderr into main error log.
I forgot to mention, I did have that set already. But I forgot to set log_level = debug. After doing that, the crash looks like this in the php-fpm log:

Code:
[13-Apr-2019 06:37:28.076363] DEBUG: pid 74486, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 2 spare children, 2 running children. Spawning rate 1
[...every second, a similar message...]
[13-Apr-2019 06:39:28.571157] DEBUG: pid 74486, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 3 spare children, 3 running children. Spawning rate 1
[13-Apr-2019 06:39:29.592387] DEBUG: pid 74486, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 2 active children, 1 spare children, 3 running children. Spawning rate 1
[13-Apr-2019 06:39:30.585265] DEBUG: pid 74486, fpm_event_loop(), line 418: event module triggered 2 events
[13-Apr-2019 06:39:30.597041] DEBUG: pid 74486, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 1 active children, 2 spare children, 3 running children. Spawning rate 1
[13-Apr-2019 06:39:30.610395] DEBUG: pid 74486, fpm_got_signal(), line 75: received SIGCHLD
[13-Apr-2019 06:39:30.614746] WARNING: pid 74486, fpm_children_bury(), line 256: [pool www] child 74494 exited on signal 11 (SIGSEGV) after 0.000051 seconds from start
[13-Apr-2019 06:39:30.624255] NOTICE: pid 74486, fpm_children_make(), line 425: [pool www] child 74500 started
[13-Apr-2019 06:39:30.627693] DEBUG: pid 74486, fpm_event_loop(), line 418: event module triggered 1 events
 
I set up a 2 MB tmpfs in /tmpsockets just for sockets (I was unsure how big to really make it), then adjusted the socket location settings in my.cnf & php.ini. I restarted MySQL and php-fpm, but it did not help. I still get the php-fpm crashes.
 
I made some progress and got a core dump, finally.

Hopefully this will help someone else in the future (possibly a future version of me):

To get a core dump from a php-fpm worker, in php-fpm.conf, rlimit_core=unlimited needs to be set in the [www] section; I had previously only set it under [global], not noticing that it was in two places.

Core dumps also need to be enabled for setuid/setgid processes:
Code:
# sysctl kern.sugid_coredump=1

The processes have an unknown current working directory they can't write to, so put them somewhere writable:
Code:
# sysctl kern.corefile=/var/tmp/%N.core

Now core dumps are produced, but they will not be usable if PHP was not built with the DEBUG port configuration option enabled. Needless to say, if you installed PHP from the official packages repository, it was not a debug build.

Code:
# cd /usr/ports/php
# make config
I enabled the DEBUG option this time.
Code:
# make
# make deinstall
# make reinstall
# pkg info php72\* | sed 's/.* //' > /tmp/foo
# portmaster `cat /tmp/foo`
# service php-fpm restart
I generated another crash, then realized I hadn't installed gdb. Luckily a package was available and I could proceed:

Code:
# pkg install gdb
# gdb /usr/local/sbin/php-fpm /var/tmp/php-fpm.core
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/php-fpm...done.
[New LWP 100226]
Core was generated by `php-fpm: pool www'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x26641354 in ?? ()
(gdb) bt
#0  0x26641354 in ?? ()
#1  0x2065b634 in ?? () from /usr/local/lib/libpcre.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

So maybe PCRE is the culprit?

Code:
# pkg info | grep pcre
pcre-8.42_1                    Perl Compatible Regular Expressions library

# pkg search -x '^pcre-'
pcre-8.43                      Perl Compatible Regular Expressions library

# pkg install -f pcre
# pkg info -r pcre
pcre-8.43:
        nginx-1.14.2_13,2
        php72-7.2.17_1

Simply upgrading pcre to 8.43 like this did not help. I am guessing I need to rebuild nginx and all things PHP yet again. Also, pcre and bunch of other ports were updated in the last 2 weeks to bump some dependencies, so I can't use any of the current prebuilt packages; I'll have to rebuild from source.

Code:
# portsnap fetch update
# portmaster pcre nginx php72\*

This will take a while. I will report back when I can test again, although the problem I was asking about—how to get php-fpm core dumps—is now solved. Hooray!

Edit: the rebuild didn't take long at all (thanks, devel/ccache!), but the crashes persist. I am also no longer getting good info from the backtrace:

Code:
Reading symbols from /usr/local/sbin/php-fpm...done.
[New LWP 100136]
Core was generated by `php-fpm: pool www'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x268b5412 in ?? ()
(gdb) bt
#0  0x268b5412 in ?? ()
#1  0x26564d64 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I will inquire elsewhere to get this diagnosed further. Thanks all.
 
Back
Top