Advice when migrating Internet forum from CentOS7 to FreeBSD

I have an Internet forum (Xenforo software running on nginx, php-fpm, MariaDB, ElasticSearch) and a mail server (dovecot, postfix, and roundcube for the front-end) running on CentOS7.

As I've acquired a taste for FreeBSD and also because CentOS7 is EOL by June 2024 (and CentOS8 is already EOL ...), I would like to migrate to FreeBSD. I already have my blog running on a FreeBSD VPS, but I have been putting off the forum + mail server migration for now because I realise my FreeBSD skills are not yet fantastic and also because getting the mail server just right is going to be painful.

Anyway, I see that jails are used a lot for FreeBSD, so I was wondering if it makes any sense to put the mail server and/or any of the other bits inside jails. Or would that just unnecessarily complicate everything or introduce unnecessary overhead?

So far I've only used packages, but would it be better to use ports for any of this? (Incidentally, a Youtuber I follow recently posted a series of videos about installing a FreeBSD mail server using ports: https://www.youtube.com/playlist?list=PLimU5OMnV2EdzIpsbIB_q6sKYdJDxnTUP)

Also, are dovecot and postfix still good options on FreeBSD or are there better alternatives I should be considering?

I think the forum migration itself should be fine, but I'd be interested to hear if anyone has done a similar migration before or just installed Xenforo on FreeBSD. Any caveats I should know about?

In any case, thank you for your time reading this.
 
I have used dovecot for a long time. It works no problems. Postfix? No I'm an old sendmail tragic because it's in base & it's what I've always used.By all accounts postfix is just as good as sendmail but with fewer features.

As to the rest, providing it's in ports/packages, you'll have no issue migrating your setup. In fact most of the configuration should transfer over with perhaps the only change being paths, oh, and things like mariadb's path CAN be set in rc.conf or you can use the traditional my.cnf/mariadb.cnf in /usr/local/etc.

Ports are good if the configuration you want of that application is not what is packaged in binary form. If it's exactly what you want then packages are your best bet.

Personally, I would suggest you get yourself setup without jails. Get it all working then create your jails and migrate postfix, dovevot, your web server etc as you get more skilled. Jails can definitely cause issues, especially with web servers, for the uninitiated.

TL;DR Until your skills are better, small baby steps rather than big strides. Don't overcomplicate it.
 
So far I've only used packages, but would it be better to use ports for any of this? (Incidentally, a Youtuber I follow recently posted a series of videos about installing a FreeBSD mail server using ports: https://www.youtube.com/playlist?list=PLimU5OMnV2EdzIpsbIB_q6sKYdJDxnTUP)

Also, are dovecot and postfix still good options on FreeBSD or are there better alternatives I should be considering?
Perhaps this does not exactly answer your questions about the mail services. I choose Postfix and Dovecot for my mail services because of long time experiences with Postfix/Cyrus IMAP in a company (somebody else did the choice before I took over the server administration). Later, I stayed with Postfix, because this piece of software never failed for me. However, I was never completely happy with Cyrus IMAP -- too many recovery runs over the years. After reading the docs, I choose Dovecot for the POP3/IMAP part for my mail systems, and I am quite happy with it, since regarding stability it plays in the same league as Postfix.

I use the packages of Postfix and Dovecot, because the pre-configured options are just fine (for me). However, I employ external relays for outgoing mails, and for this I need Postfix with SASL support, and therefore I installed the package mail/postfix-sasl.
 
Anyway, I see that jails are used a lot for FreeBSD, so I was wondering if it makes any sense to put the mail server and/or any of the other bits inside jails. Or would that just unnecessarily complicate everything or introduce unnecessary overhead?

If you're just starting with FreeBSD, the administrative overhead is far less with Jails as you just have one kernel to deal with. If you for some reason need kernel level tuning for an application; bhyve will enable that. Start small with jails then decide from there I'd suggest.

So far I've only used packages, but would it be better to use ports for any of this? (Incidentally, a Youtuber I follow recently posted a series of videos about installing a FreeBSD mail server using ports: https://www.youtube.com/playlist?list=PLimU5OMnV2EdzIpsbIB_q6sKYdJDxnTUP)

It depends. if you're picky about the dependencies included in each package, want a more cleaner system, or the default package don't provide what you require; then use ports. If you want a turnkey experience and to just get things done; use packages.

Also, are dovecot and postfix still good options on FreeBSD or are there better alternatives I should be considering?

I'd stick with what you know. Jumping to another solution adds more administrative load on your end; learning new stuff, new configuration files/parameters, etc.

CentOS 7 is pre-systemd right? If not, good luck. ?
 
No, it's systemd. However, I don't think that will give too many issues, it's just a different way to start services. Postfix and Dovecot should work without problems. Packages should be fine unless you're doing something unusual. You can cd into a port's directory and type
Code:
make config
to see the possible options. The ones that are unmarked are what you'll get with the package. If you need one of those unmarked options, you'll need the port.
 
I have an Internet forum (Xenforo software running on nginx, php-fpm, MariaDB, ElasticSearch) and a mail server (dovecot, postfix, and roundcube for the front-end) running on CentOS7.
Hi, welcome! Personally I've looked at its supposed successors, Alma and Rocky Linux. When I saw that these had in the base installation dbus running as requirement (maybe because of firewalld?) this was enough to forget about them.

Anyway, for the migration: all major moving parts are running on FreeBSD as well without problems. For PHP you've got to ensure to activate the modules/extensions which Xenforo requires to run.

In terms of the mail architecture, well - Dovecot is the standard IMAP server since years. It's got an excellent track record, it scales well, is well documented and introduced many own innovations. Most other opensource IMAP/POP servers don't show much progress anyway since Dovecot took the crown of the king of IMAP servers.

Postfix - nothing wrong with that one either. I mean its author, Wietse Venema, is a well respected security researcher and it shows. The only reason to go for something else would be if you really need to send out millions of emails a day and Postfix is just too slow for that (go Haraka), or have some issues with its weird license (IBM Public License/Eclipse Public License) or have really weird, esoteric configuration needs which Postfix cannot be adapted to. So far I've not seen such a case yet though.
 
Thanks everyone for your thoughts and advice!

Thought I should post an update on this project, albeit a late one!

Forum and mail server now running smoothly for a long time now on FreeBSD, currently on 14.2!

I have one small problem which is that Elasticsearch occasionally dies, so I need to log in and start it up again. I'm not quite sure how to diagnose the problem. Could it be the Out Of Memory (OOM) killer? Or perhaps something to do with Elasticsearch being a Java application running on the JVM? In my experience these are a category of applications which unfortunately tend to just die after some time due to memory leaks.

The problem is I don't see any issues in the few relevant log files I have found. (/tmp/elasticsearch-${NUMBER}/gc.log*)

As a workaround, is there an easy way to configure a service such as Elasticsearch to be automatically started up again if it's not running?
 
As a workaround, is there an easy way to configure a service such as Elasticsearch to be automatically started up again if it's not running?
A simple script fired up by a cronjob should be enough for that; assuming Elastic is an official process that's handled by the RC structure (rc.conf / /usr/local/etc/rc.d).

I could imagine something like this:

Code:
#!/bin/sh

# Service definition
proc=postgresql

## Main routines (don't change)

stat=$(service $proc status | grep not);

if [ "$stat" != "" ]; then
        echo Not Running.
fi
So... replace echo with the command to (re)start Elastic. In my example this would be something like service postgresql start.

Then save the script somewhere, and use # crontab -e (as root) to set it up to run. Maybe once every 30 minutes or so?

Code:
*/30 * * * *     /root/bin/check_elastic
Keep in mind that this is kinda a bad idea because... you need to get to the cause of your problems, and just mindlessly restarting a b0rk down service could result in problems. Still, it helps I guess.
 
I have one small problem which is that Elasticsearch occasionally dies, so I need to log in and start it up again. I'm not quite sure how to diagnose the problem. Could it be the Out Of Memory (OOM) killer? Or perhaps something to do with Elasticsearch being a Java application running on the JVM? In my experience these are a category of applications which unfortunately tend to just die after some time due to memory leaks.

The problem is I don't see any issues in the few relevant log files I have found. (/tmp/elasticsearch-${NUMBER}/gc.log*)

As a workaround, is there an easy way to configure a service such as Elasticsearch to be automatically started up again if it's not running?
Do you see any files with a name like hs_err_pid(number).log? The JVM should leave those behind if it crashes:

Elasticsearch doesn't log to /var/log anywhere?

For process supervision, I'd look at sysutils/daemontools.
 
Do you see any files with a name like hs_err_pid(number).log?
Strangely, no. There are several sub-dirs /tmp/elasticsearch-${NUMBER}/, and I believe a new one of these is created each time Elasticsearch is started. But none of them contains a hs_err_pid${NUMBER}.log file even though the process that starts Elasticsearch specifies the file. E.g. ps auxww shows java ... -XX:ErrorFile=/tmp/elasticsearch-6998135034409217656/hs_err_pid%p.log ... . So, maybe that means it's terminated not by the JVM itself but something external, like the OOM killer?
Elasticsearch doesn't log to /var/log anywhere?
No, although there is a directory /var/log/elasticsearch/ which I think is created by /usr/local/etc/rc.d/elasticsearch. However, this is empty and I don't see any references to this directory in any of the Elasticsearch config files.
For process supervision, I'd look at sysutils/daemontools.
OK, thanks! I have also seen recommendations for sysutils/monit and sysutils/py-supervisor. Would you say daemontools is still what I should be using for this?
.
 
A simple script fired up by a cronjob should be enough for that; assuming Elastic is an official process that's handled by the RC structure (rc.conf / /usr/local/etc/rc.d).

I could imagine something like this:
Thank you, I had thought about something like that, but I was hoping there already existed features for this kind of thing, preferably in the base system or otherwise in ports. Various init systems for Linux (including the infamous systemd) do have features to auto-restart crashed services!
Keep in mind that this is kinda a bad idea because... you need to get to the cause of your problems, and just mindlessly restarting a b0rk down service could result in problems. Still, it helps I guess.
Agreed ...
 
Strangely, no. There are several sub-dirs /tmp/elasticsearch-${NUMBER}/, and I believe a new one of these is created each time Elasticsearch is started. But none of them contains a hs_err_pid${NUMBER}.log file even though the process that starts Elasticsearch specifies the file. E.g. ps auxww shows java ... -XX:ErrorFile=/tmp/elasticsearch-6998135034409217656/hs_err_pid%p.log ... . So, maybe that means it's terminated not by the JVM itself but something external, like the OOM killer?
Really hard to tell without logs. Can you post the full command line? Or at least the Java options that set the heap size.

OK, thanks! I have also seen recommendations for sysutils/monit and sysutils/py-supervisor. Would you say daemontools is still what I should be using for this?
.
I don't know much about those. I ready briefly about sysutils/py-supervisor, and I think it's focus is managing process on many machines. Seem overkill for what you want.
 
Really hard to tell without logs. Can you post the full command line? Or at least the Java options that set the heap size.
This is the command for the currently running process:
/usr/local/openjdk17/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -Djava.security.manager=allow -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j2.formatMsgNoLookups=true -Djava.locale.providers=SPI,COMPAT --add-opens=java.base/java.io=org.elasticsearch.preallocate -XX:+UseG1GC -Djava.io.tmpdir=/tmp/elasticsearch-6998135034409217656 -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=/tmp/elasticsearch-6998135034409217656/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/tmp/elasticsearch-6998135034409217656/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Xms420m -Xmx420m -XX:MaxDirectMemorySize=220200960 -XX:G1HeapRegionSize=4m -XX:InitiatingHeapOccupancyPercent=30 -XX:G1ReservePercent=15 -Des.distribution.type=tar --module-path /usr/local/lib/elasticsearch/lib --add-modules=jdk.net --add-modules=ALL-MODULE-PATH -m org.elasticsearch.server/org.elasticsearch.bootstrap.Elasticsearch
So, heap size is 420m. To be fair, maybe that is a bit small. But this is running on a 4G VPS with other processes as explained in the first of this thread. Still, I could probably give it more than it has now.
 
I just figured out where ES was configured to write its logs: /var/run/elasticsearch/. Which sounds wrong, especially when there is a perfectly good /var/log/elasticsearch/ directory as well. I believe /var/run/ is for PID files and such, not log files?

Anyway, I didn't see anything suspicious in those log files either, unfortunately.

I've now generously increased the heap size to 512m.
 
Back when I used to run ES to store firewall logs it was by far the biggest VM on my little box. I think I gave it 4Gb of RAM or even more.
Java and especially ES is so memory-inefficient - and has been ever since I started using (and working with) it more or less 20 years ago.
 
Not sure if this is best practice, but when I write rc scripts for services I generally use the `daemon -r` option. The `-r` tells it to auto-restart if it quits.


Code:
       -r, --restart
           Supervise and restart the program after a one-second  delay  if
           it has been terminated
Unfortunately, it looks like the rc script for elasticsearch doesn't use daemon(8), as far as I can tell ...
 
The HeapDumpPath setting looks fishy. Have you noticed any subdirectories called "data" in any of you ES folders?

Also, the G1 Garbage collector (-XX:+UseG1GC) is designed for much larger heaps:
The first focus of G1 is to provide a solution for users running applications that require large heaps with limited GC latency. This means heap sizes of around 6GB or larger, and stable and predictable pause time below 0.5 seconds.

I'd experiment with the serial collector if it's a small server, and with the Shenandoah collector if the JVM you're using is recent enough:
 
The HeapDumpPath setting looks fishy. Have you noticed any subdirectories called "data" in any of you ES folders?
Yes, I thought the same. I have not seen any "data" sub-dirs anywhere. So, I have changed this to ${ES_TMPDIR}, which is where the GC logs go as well.
Also, the G1 Garbage collector (-XX:+UseG1GC) is designed for much larger heaps:
OK, thanks, I have changed it to use the Shenandoah collector:
Code:
-XX:-UseG1GC
-XX:+UseShenandoahGC
 
Elasticsearch had run until yesterday around 3:48am according to gc.log, when it seems to have crashed or stopped somehow.

Not really sure what bits of log would be helpful to look at for debugging purposes?

Start of gc.log:
[2025-05-18T13:50:20.248+0000][90209][gc] Min heap equals to max heap, disabling ShenandoahUncommit
[2025-05-18T13:50:20.264+0000][90209][gc] Heuristics ergonomically sets -XX:+ExplicitGCInvokesConcurrent
[2025-05-18T13:50:20.264+0000][90209][gc] Heuristics ergonomically sets -XX:+ShenandoahImplicitGCInvokesConcurrent
[2025-05-18T13:50:20.264+0000][90209][gc] Using Shenandoah
[2025-05-18T13:50:21.085+0000][90209][gc,ergo] Pacer for Idle. Initial: 10485K, Alloc Tax Rate: 1.0x
[2025-05-18T13:50:21.085+0000][90209][gc,init] Version: 17.0.15+6-1 (release)
[2025-05-18T13:50:21.085+0000][90209][gc,init] CPUs: 2 total, 2 available
[2025-05-18T13:50:21.086+0000][90209][gc,init] Memory: 4061M
[2025-05-18T13:50:21.086+0000][90209][gc,init] Large Page Support: Disabled
[2025-05-18T13:50:21.086+0000][90209][gc,init] NUMA Support: Disabled
[2025-05-18T13:50:21.086+0000][90209][gc,init] Compressed Oops: Enabled (Non-zero based)
[2025-05-18T13:50:21.086+0000][90209][gc,init] Heap Min Capacity: 512M
[2025-05-18T13:50:21.086+0000][90209][gc,init] Heap Initial Capacity: 512M
[2025-05-18T13:50:21.086+0000][90209][gc,init] Heap Max Capacity: 512M
[2025-05-18T13:50:21.086+0000][90209][gc,init] Pre-touch: Enabled
[2025-05-18T13:50:21.086+0000][90209][gc,init] Mode: Snapshot-At-The-Beginning (SATB)
[2025-05-18T13:50:21.086+0000][90209][gc,init] Heuristics: Adaptive
[2025-05-18T13:50:21.086+0000][90209][gc,init] Heap Region Count: 2048
[2025-05-18T13:50:21.086+0000][90209][gc,init] Heap Region Size: 256K
[2025-05-18T13:50:21.086+0000][90209][gc,init] TLAB Size Max: 256K
[2025-05-18T13:50:21.086+0000][90209][gc,init] Humongous Object Threshold: 256K
[2025-05-18T13:50:21.086+0000][90209][gc,init] Parallel Workers: 1
[2025-05-18T13:50:21.086+0000][90209][gc,init] Concurrent Workers: 1

End of gc.log:
[2025-05-22T03:47:25.189+0000][90209][gc ] Trigger: Time since last GC (300002 ms) is larger than guaranteed interval (300000 ms)
[2025-05-22T03:47:25.192+0000][90209][gc,ergo ] Free: 333M, Max: 256K regular, 86016K humongous, Frag: 72% external, 1% internal; Reserve: 26368K, Max: 256K
[2025-05-22T03:47:25.192+0000][90209][gc,start ] GC(1113) Concurrent reset
[2025-05-22T03:47:25.192+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent reset
[2025-05-22T03:47:25.192+0000][90209][gc,ergo ] GC(1113) Pacer for Reset. Non-Taxable: 512M
[2025-05-22T03:47:25.196+0000][90209][gc ] GC(1113) Concurrent reset 4.145ms
[2025-05-22T03:47:25.197+0000][90209][gc,start ] GC(1113) Pause Init Mark (unload classes)
[2025-05-22T03:47:25.197+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for init marking
[2025-05-22T03:47:25.197+0000][90209][gc,ergo ] GC(1113) Pacer for Mark. Expected Live: 87464K, Free: 333M, Non-Taxable: 34110K, Alloc Tax Rate: 0.3x
[2025-05-22T03:47:25.197+0000][90209][gc ] GC(1113) Pause Init Mark (unload classes) 0.179ms
[2025-05-22T03:47:25.197+0000][90209][safepoint ] Safepoint "ShenandoahInitMark", Time since last: 300009312676 ns, Reaching safepoint: 103422 ns, Cleanup: 34248 ns, At safepoint: 230252 ns, Total: 367922 ns
[2025-05-22T03:47:25.197+0000][90209][gc,start ] GC(1113) Concurrent marking roots
[2025-05-22T03:47:25.197+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent marking roots
[2025-05-22T03:47:25.202+0000][90209][gc ] GC(1113) Concurrent marking roots 5.668ms
[2025-05-22T03:47:25.203+0000][90209][gc,start ] GC(1113) Concurrent marking (unload classes)
[2025-05-22T03:47:25.203+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent marking
[2025-05-22T03:47:25.422+0000][90209][gc ] GC(1113) Concurrent marking (unload classes) 219.748ms
[2025-05-22T03:47:25.422+0000][90209][gc,start ] GC(1113) Pause Final Mark (unload classes)
[2025-05-22T03:47:25.422+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for final marking
[2025-05-22T03:47:25.423+0000][90209][gc,ergo ] GC(1113) Adaptive CSet Selection. Target Free: 74274K, Actual Free: 372M, Max CSet: 21845K, Min Garbage: 0B
[2025-05-22T03:47:25.423+0000][90209][gc,ergo ] GC(1113) Collectable Garbage: 56823K (82%), Immediate: 51200K (74%), CSet: 5623K (8%)
[2025-05-22T03:47:25.423+0000][90209][gc,ergo ] GC(1113) Pacer for Evacuation. Used CSet: 5632K, Free: 383M, Non-Taxable: 39230K, Alloc Tax Rate: 1.1x
[2025-05-22T03:47:25.423+0000][90209][gc ] GC(1113) Pause Final Mark (unload classes) 0.315ms
[2025-05-22T03:47:25.423+0000][90209][safepoint ] Safepoint "ShenandoahFinalMarkStartEvac", Time since last: 225617513 ns, Reaching safepoint: 6612 ns, Cleanup: 18314 ns, At safepoint: 337583 ns, Total: 362509 ns
[2025-05-22T03:47:25.423+0000][90209][gc,start ] GC(1113) Concurrent thread roots
[2025-05-22T03:47:25.423+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for Concurrent thread roots
[2025-05-22T03:47:25.425+0000][90209][gc ] GC(1113) Concurrent thread roots 1.919ms
[2025-05-22T03:47:25.425+0000][90209][gc,start ] GC(1113) Concurrent weak references
[2025-05-22T03:47:25.425+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent weak references
[2025-05-22T03:47:25.425+0000][90209][gc,ref ] GC(1113) Encountered references: Soft: 954, Weak: 14753, Final: 11, Phantom: 1370
[2025-05-22T03:47:25.425+0000][90209][gc,ref ] GC(1113) Discovered references: Soft: 562, Weak: 530, Final: 0, Phantom: 588
[2025-05-22T03:47:25.425+0000][90209][gc,ref ] GC(1113) Enqueued references: Soft: 0, Weak: 2, Final: 0, Phantom: 0
[2025-05-22T03:47:25.425+0000][90209][gc ] GC(1113) Concurrent weak references 0.370ms
[2025-05-22T03:47:25.425+0000][90209][gc,start ] GC(1113) Concurrent weak roots
[2025-05-22T03:47:25.425+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent weak root
[2025-05-22T03:47:25.487+0000][90209][gc ] GC(1113) Concurrent weak roots 62.269ms
[2025-05-22T03:47:25.487+0000][90209][gc,start ] GC(1113) Concurrent cleanup
[2025-05-22T03:47:25.487+0000][90209][gc ] GC(1113) Concurrent cleanup 152M->102M(512M) 0.111ms
[2025-05-22T03:47:25.488+0000][90209][gc,ergo ] GC(1113) Free: 383M, Max: 256K regular, 86016K humongous, Frag: 76% external, 1% internal; Reserve: 26365K, Max: 256K
[2025-05-22T03:47:25.488+0000][90209][gc,start ] GC(1113) Concurrent class unloading
[2025-05-22T03:47:25.488+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent class unloading
[2025-05-22T03:47:25.530+0000][90209][gc ] GC(1113) Concurrent class unloading 42.363ms
[2025-05-22T03:47:25.530+0000][90209][gc,start ] GC(1113) Concurrent strong roots
[2025-05-22T03:47:25.530+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent strong root
[2025-05-22T03:47:25.531+0000][90209][gc ] GC(1113) Concurrent strong roots 0.572ms
[2025-05-22T03:47:25.531+0000][90209][gc,start ] GC(1113) Concurrent evacuation
[2025-05-22T03:47:25.531+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent evacuation
[2025-05-22T03:47:25.531+0000][90209][gc ] GC(1113) Concurrent evacuation 0.239ms
[2025-05-22T03:47:25.531+0000][90209][gc,start ] GC(1113) Pause Init Update Refs
[2025-05-22T03:47:25.531+0000][90209][gc,ergo ] GC(1113) Pacer for Update Refs. Used: 102M, Free: 383M, Non-Taxable: 39229K, Alloc Tax Rate: 1.1x
[2025-05-22T03:47:25.531+0000][90209][gc ] GC(1113) Pause Init Update Refs 0.113ms
[2025-05-22T03:47:25.531+0000][90209][safepoint ] Safepoint "ShenandoahInitUpdateRefs", Time since last: 108115957 ns, Reaching safepoint: 11185 ns, Cleanup: 5288 ns, At safepoint: 145487 ns, Total: 161960 ns
[2025-05-22T03:47:25.531+0000][90209][gc,start ] GC(1113) Concurrent update references
[2025-05-22T03:47:25.531+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for concurrent reference update
[2025-05-22T03:47:25.592+0000][90209][gc ] GC(1113) Concurrent update references 60.728ms
[2025-05-22T03:47:25.592+0000][90209][gc,start ] GC(1113) Concurrent update thread roots
[2025-05-22T03:47:25.594+0000][90209][gc ] GC(1113) Concurrent update thread roots 1.970ms
[2025-05-22T03:47:25.594+0000][90209][gc,start ] GC(1113) Pause Final Update Refs
[2025-05-22T03:47:25.594+0000][90209][gc,task ] GC(1113) Using 1 of 1 workers for final reference update
[2025-05-22T03:47:25.594+0000][90209][gc ] GC(1113) Pause Final Update Refs 0.194ms
[2025-05-22T03:47:25.594+0000][90209][safepoint ] Safepoint "ShenandoahFinalUpdateRefs", Time since last: 62994859 ns, Reaching safepoint: 11346 ns, Cleanup: 4335 ns, At safepoint: 218000 ns, Total: 233681 ns
[2025-05-22T03:47:25.594+0000][90209][gc,start ] GC(1113) Concurrent cleanup
[2025-05-22T03:47:25.594+0000][90209][gc ] GC(1113) Concurrent cleanup 102M->96M(512M) 0.057ms
[2025-05-22T03:47:25.594+0000][90209][gc,ergo ] Free: 388M, Max: 256K regular, 86016K humongous, Frag: 77% external, 1% internal; Reserve: 26368K, Max: 256K
[2025-05-22T03:47:25.594+0000][90209][gc,stats ]
[2025-05-22T03:47:25.594+0000][90209][gc,stats ] All times are wall-clock times, except per-root-class counters, that are sum over
[2025-05-22T03:47:25.594+0000][90209][gc,stats ] all workers. Dividing the <total> over the root stage time estimates parallelism.
[2025-05-22T03:47:25.594+0000][90209][gc,stats ]
[2025-05-22T03:47:25.594+0000][90209][gc,stats ] Concurrent Reset 4195 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Init Mark (G) 514 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Init Mark (N) 192 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Update Region States 63 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Mark Roots 5721 us, parallelism: 0.86x
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CMR: <total> 4925 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CMR: Thread Roots 4423 us, workers (us): 4423,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CMR: VM Strong Roots 36 us, workers (us): 36,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CMR: CLDG Roots 465 us, workers (us): 465,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Marking 219795 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Final Mark (G) 417 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Final Mark (N) 323 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Finish Mark 52 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Update Region States 67 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Choose Collection Set 116 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Rebuild Free Set 15 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Thread Roots 1931 us, parallelism: 0.96x
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CTR: <total> 1861 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CTR: Thread Roots 1861 us, workers (us): 1861,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Weak References 381 us, parallelism: 0.71x
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CWRF: <total> 269 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CWRF: Weak References 269 us, workers (us): 269,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Weak Roots 62306 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Roots 62133 us, parallelism: 1.00x
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CWR: <total> 62082 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CWR: Code Cache Roots 60404 us, workers (us): 60404,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CWR: VM Weak Roots 1589 us, workers (us): 1589,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CWR: CLDG Roots 89 us, workers (us): 89,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Rendezvous 116 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Cleanup 125 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Class Unloading 42430 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Unlink Stale 40589 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] System Dictionary 106 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Weak Class Links 3 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Code Roots 40475 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Rendezvous 148 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Purge Unlinked 1598 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Code Roots 1558 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CLDG 25 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Exception Caches 0 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Strong Roots 598 us, parallelism: 0.57x
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CSR: <total> 341 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CSR: VM Strong Roots 9 us, workers (us): 9,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] CSR: CLDG Roots 332 us, workers (us): 332,
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Evacuation 260 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Init Update Refs (G) 234 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Init Update Refs (N) 121 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Manage GCLABs 71 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Update Refs 60928 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Update Thread Roots 1987 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Final Update Refs (G) 310 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Pause Final Update Refs (N) 209 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Update Region States 115 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Trash Collection Set 4 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Rebuild Free Set 20 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Concurrent Cleanup 68 us
[2025-05-22T03:47:25.595+0000][90209][gc,stats ]
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] Allocation pacing accrued:
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] 0 of 300407 ms ( 0.0%): <total>
[2025-05-22T03:47:25.595+0000][90209][gc,stats ] 0 of 300407 ms ( 0.0%): <average total>
[2025-05-22T03:47:25.595+0000][90209][gc,stats ]
[2025-05-22T03:47:25.595+0000][90209][gc,metaspace] Metaspace: 128892K(131008K)->128892K(131008K) NonClass: 111531K(112576K)->111531K(112576K) Class: 17360K(18432K)->17360K(18432K)
[2025-05-22T03:47:25.595+0000][90209][gc,ergo ] Pacer for Idle. Initial: 10485K, Alloc Tax Rate: 1.0x
[2025-05-22T03:48:10.598+0000][90209][safepoint ] Safepoint "Cleanup", Time since last: 45003672777 ns, Reaching safepoint: 105762 ns, Cleanup: 87273 ns, At safepoint: 15105 ns, Total: 208140 ns
 
Elasticsearch had run until yesterday around 3:48am according to gc.log, when it seems to have crashed or stopped somehow.

Not really sure what bits of log would be helpful to look at for debugging purposes?

Start of gc.log:


End of gc.log:
Maybe the OOM killer got it? Anything in the logs? Dmesg?

I've never had the OOM killer attack, so I don't know exactly what to look for in the logs. Maybe something like this?
Code:
Sep 29 15:07:21 remote kernel: swap_pager_getswapspace(20): failed
Sep 29 15:07:21 remote kernel: swap_pager_getswapspace(18): failed
Sep 29 15:07:21 remote kernel: swap_pager_getswapspace(32): failed
Sep 29 15:07:21 remote syslogd: last message repeated 1 times
Sep 29 15:07:26 remote kernel: pid 5913 (mysqld), uid 88, was killed: out of swap space
Sep 29 15:07:27 remote kernel: pid 888 (ntpd), uid 123, was killed: out of swap space
Sep 29 15:07:27 remote kernel: swap_pager_getswapspace(32): failed
 
Back
Top