Thread Stack Size - Segmentation Fault

Hi

Since quite some time I have problems with two applications
  • OpenLDAP
  • Munin
Both seem to suffer under an issue of the same origin. Both receive a SIGSEGV at some sporadic point of execution. No clear pattern of an exact reproduction can be extracted of all debugging done this far.
  • Munin runs with more than around 100 plug-ins and about 150 graphs (times 4x, for day, week, month, year).
  • OpenLDAP actually does not have that much of an workload. Regular PAM via nslcd(8). Some crashes happened every couple of hours or even only days. I wrote a supervisor script, in order to have a workaround for production usage. But since the mail server (Dovecot & Postfix) are connected to it - LDAP crashes with segmentation fault after a few seconds of work. Notice, that there is only one single test user connected. So now I am at the point where my logs are flooded with connection errors more then with successful connections. The workaround does not qualify to be usable any longer.
This was enough for me to finally go into deeper debugging, than done before related to this issue. It turns out this might pretty much possible a FreeBSD OS bug. I was not happy to find this out, cause I would love to continue using FreeBSD for my server environment.


Here are the relevant links about the OpenLDAP related SIGSEGV issue:

FreeBSD Forum:

OpenLDAP Mailing List:
  1. http://www.openldap.org/lists/openldap-technical/201504/msg00220.html
  2. http://www.openldap.org/lists/openldap-technical/201504/msg00228.html
  3. http://www.openldap.org/lists/openldap-technical/201504/msg00237.html
  4. http://www.openldap.org/lists/openldap-technical/201504/msg00241.html
  5. http://www.openldap.org/lists/openldap-technical/201504/msg00238.html
  6. http://www.openldap.org/lists/openldap-technical/201504/msg00239.html
  7. http://www.openldap.org/lists/openldap-technical/201504/msg00248.html
  8. http://www.openldap.org/lists/openldap-technical/201504/msg00249.html
  9. http://www.openldap.org/lists/openldap-technical/201504/msg00250.html
  10. http://www.openldap.org/lists/openldap-technical/201504/msg00254.html
  11. http://www.openldap.org/lists/openldap-technical/201504/msg00255.html
  12. http://www.openldap.org/lists/openldap-technical/201504/msg00256.html
  13. http://www.openldap.org/lists/openldap-technical/201504/msg00257.html
  14. http://www.openldap.org/lists/openldap-technical/201504/msg00282.html
Link 11, 12 and 13 point out that it is likely to be a FreeBSD problem described on:
  1. http://www.openldap.org/lists/openldap-bugs/200506/msg00174.html
  2. http://lists.freebsd.org/pipermail/freebsd-current/2014-August/051646.html

So my question is: Does FreeBSD still have this problem with the stack size? Or has this bug been fixed already? Is there a way to get at least OpenLDAP run stable enough for production usage with workaround(s)? Cause currently is just no way to use it, since it dies after seconds after the slapd service got started.

P.S.: Might setrlimit(2) be an option for OpenLDAP to use in order to request a larger maximum stack size?

Also, I just checked via ldd(1). It looks like slapd is successfully linked to /lib/libthr.so.3
Code:
root@FreeBSD [~]$ ldd /usr/local/libexec/slapd
/usr/local/libexec/slapd:
  libldap_r-2.4.so.2 => /usr/local/lib/libldap_r-2.4.so.2 (0x8009a7000)
  liblber-2.4.so.2 => /usr/local/lib/liblber-2.4.so.2 (0x800bf5000)
  libltdl.so.7 => /usr/local/lib/libltdl.so.7 (0x800e03000)
  libcrypt.so.5 => /lib/libcrypt.so.5 (0x80100c000)
  libwrap.so.6 => /usr/lib/libwrap.so.6 (0x80122c000)
  libssl.so.7 => /usr/lib/libssl.so.7 (0x801435000)
  libcrypto.so.7 => /lib/libcrypto.so.7 (0x8016a0000)
  libthr.so.3 => /lib/libthr.so.3 (0x801a93000)
  libc.so.7 => /lib/libc.so.7 (0x801cb8000)
Code:
root@FreeBSD [~]$ ls -lach /lib/libthr.so.3
-r--r--r--  1 root  wheel  103K 18 Jan 15:36 /lib/libthr.so.3
 
The content of the database is also very little, since it is only a test server environment.
Code:
root@FreeBSD [~]$ ls -lach /var/db/openldap-data/
total 288
drwx------  2 ldap  ldap  512B 28 Apr 18:11 .
drwxr-xr-x  20 root  wheel  1.0K 28 Apr 18:12 ..
-rw-------  1 ldap  ldap  128K 28 Apr 18:11 data.mdb
-rw-------  1 ldap  ldap  8.0K 28 Apr 18:16 lock.mdb
LDAP-Tree.png



Please let me know if I could deliver you any further debug information in order to locate the exact problem, so we can open up a clear bug report about this serious issue.
 

jhb@

Developer
I just replied to the PR but wanted to follow up here as well. Starting with 10.1, you can now force the initial thread in a process to use the full RLIMIT_STACK stack size by setting LIBPTHREAD_BIGSTACK_MAIN in the environment before starting the process. In 10.2 and later this behavior will be the default. You can still set the environment variable on 10.2, it will just be ignored.
 
Top