Solved openldap crashing when replicating

I have an odd error and it's a bit hard to explain but I'll do my best and please feel free to ask if additional info is needed.

We have a fully updated FreeBSD 11.2 server. On that server are 3 FreeBSD 10.4 jails. All 3 jails take care of accounts and authentication for other sites/services.

Jail 1 has apache and postgresql and people use it to manage their accounts.
Jail 2 has openldap 2.4 installed and is a master LDAP for jail 3
Jail 3 has openldap 24. installed, is the main authentication server for other sites and services, and is an LDAP slave of jail 2

The way it all works is jail 1 has a script that will sync any account changes from the postgresql server to the LDAP server on jail 2 which then syncs to the LDAP server on jail 3 using LDAP sync replication.

This has all worked fine for years until a few weeks ago when I completely re-setup the jails using FreeBSD 11.2 due to 10.4 being EOL. Everything has been set up the same, same configurations, the software versions are the same, etc. the only difference between the jails is the FreeBSD version. However, now whenever there is a large sync of user data one or both of the LDAP servers will crash with the error "BDB0060 PANIC: fatal region error detected; run recovery". Initially, I thought it was maybe a corrupt account that was added but every time it's crashed I've checked the last synced account and there's nothing wrong with it and it always crashes at a different point so I've pretty much ruled that out as the issue.

I've tried everything I can think of but I can't figure out the problem so I've now rolled back to the old jails and everything is running fine ... no crashes when syncing and, as a matter of fact, the syncing is probably twice as fast. It was taking about 45 minutes to sync 12,400 accounts but now it only takes around 20 minutes.
The fact that the old jails work fine and the new ones don't is very odd because, as I mentioned before, everything is set up exactly the same between the old FreeBSD jails and the new one with the exception of the FreeBSD version. It's almost as though the LDAP replication can't write fast enough to keep up and is crashing but I'm not sure if that makes sense or not.

I did not put this system together so I can't answer exactly as to why it's set up the way it is or any super technical details about it. But I have been managing the jails for a long time and have upgraded them from FreeBSD 9.x to 10.x with no issues and am only just now having this issue when going from FreeBSD 10.4 to 11.2.

Does that all make sense? Can anyone think of any reason why this might be happening? Any help would be very much appreciated.

Thank you!
 
How did you install software on the jails & the host system? In other words: do you use ports or packages (or both)?

(edit) Also important: Did you re-install everything after the upgrade to 11.2? If not then that could explain your current problems.
 
I would dump bdb as it's going away and go to mdb instead.
-- running openldap-server-2.4.46_5 with a 3-way multimaster and about 2500 accounts and no issues.
 
How did you install software on the jails & the host system? In other words: do you use ports or packages (or both)?

(edit) Also important: Did you re-install everything after the upgrade to 11.2? If not then that could explain your current problems.

Thanks for the reply.
The software was installed from packages, we actually have our own package server and the packages were built with poudriere using the exact same config options.

Yes, I totally set up a new jail from scratch.
 
I would dump bdb as it's going away and go to mdb instead.
-- running openldap-server-2.4.46_5 with a 3-way multimaster and about 2500 accounts and no issues.

Yes, I'm seriously considering this as it's long overdue. I just still find it odd that it all works fine in FreeBSD 10.4 jails but not 11.2 jails even though everything else is identical.
 
The software was installed from packages, we actually have our own package server and the packages were built with poudriere using the exact same config options.
Were those packages in the repository rebuild after the upgrade to 11.2 or did you simply re-use what you had on the clients?
 
Were those packages in the repository rebuild after the upgrade to 11.2 or did you simply re-use what you had on the clients?

Rebuilt from scratch. I actually made a totally new poudriere jail and rebuilt the packages instead of just upgrading the jail and rebuilding packages.

I just setup test jails using the mdb backend and am going to test with that now.
 
Switching to the mdb backend seems to have done it, it's been two weeks with no crashes. Thank you for the help.
 
Back
Top