Why is PostgreSQL on FreeBSD 11.2 so much slower than on CentOS 7?

In the last few weeks I've been toying around with the idea of porting our main production database servers from Linux to FreeBSD. Our current generation is running RHEL6/PG9, and the new generation is going to be either CentOS 7 and PostgreSQL 10.5 or FreeBSD and PostgreSQL 10.5. In order to run some tests, I've installed two virtual machines from scratch, both running on the exact same VMware host and storage in a lab and thus unaffected by other VMs or day-to-day load. Both VMs have four cores each (Intel Xeon E5-2690 v2), 16GB of RAM, and are saved on an unmirrored NetApp storage volume to eliminate extra I/O latencies. Thus, I haven't configured a ZFS raid but configured three separate UFS disks: 40GB for the OS, 100GB for the database, 100GB for the WAL-Logg files). On the CentOS 7 server, all three disks are XFS. The two operating systems in question are the latest CentOS 7.5.1804 and FreeBSD 11.2-RELEASE r335510.

After initializing the database I took a dump created with pg_dumpall and restored it. Both databases are working as expected and I can access both from the local machine as well as our development application server running on the same VMware host. And here's where it gets interesting: PG10 on FreeBSD is so dead slow that it's painful. Not when running pgbench, which returns almost identical values for all three systems most of the time (I also have a RHEL6/9.2 clone running for comparison), but when running large queries on our database. For example, a simple "select count(*) from table" on a table with several million records takes 12s on RHEL6/PG9, less than 10s on RHEL7/PG10 but anywhere between 30s and 90s, with runtime hovering around the 60s mark most of the time, on FreeBSD11.2/PG10. Occasionally, even pgbench fails to run properly and returns less than 150 tps instead of the expected 1,300+ tps. Running htop in a second window I can see PG10 forks its four worker processes as expected and configured in postgresql.conf, however it fails to cause any significant amount of CPU load. Running above select on CentOS 7/PG10 causes a load of 30-40% on all four CPU cores whereas running the same select on FreeBSD11.2/PG10 causes the CPUs to practically idle around 5-10%.

I've googled around and found some very very old tutorials on performance-tuning FreeBSD for PostgreSQL. The PostgreSQL settings between the three systems are absolutely identical, and the database contents are 100% identical (restored all three from the exact same dumpfile). The underlying hardware is absolutely identical. And still I can't wrap my head around this - something must clearly be going wrong here. Being new to FreeBSD I am a bit flabbergasted and lost, any hints where I could start investigating this issue?
 
Did you install both environments "as is"? Just asking because you don't provide much details about the most important aspect of all this: the configuration itself.

On FreeBSD the whole thing is provided "as is", so no handholding; just the product as it was distributed by the original authors.

Linux on the other hand is known for its pre-configuration aspect. Which could definitely explain a thing or two here.
 
You may need to do some performance tuning, googling for "freebsd performance tuning" should give you some starting points.
 
Yes, both systems were installed "as-is" from their official release ISOs. I have not tuned either system at all.

However, like I said above I have also tried to Google performance tuning fo FreeBSD for PostgreSQL but was only able to find old articles from the early 2000s. I'll probably have to broaden my spectrum and Google for general FreeBSD performance tuning instead.
 
I've installed two virtual machines
Never use virtual machines for performance testing.

A few years back, we signed a client up who was using psql and centos. Since we are a FreeBSD web dev company, we immediately put him on our servers along with psql which we also use. A number of years before that, when we first started up, we did performance testing with mySQL, psql and Linux servers. I can't give specifics cause it's been so long but psql did as well or outperformed the centos installation. I just don't recall anything beyond that because, once a decision is made, I tend to forget the why until someone says something has changed.

My point in all this is not to use virtual machines for performance testing.
 
Do you have update_process_title = off ?
I do not since from what I understand it only affects Windows servers running PostgreSQL. Just to double check I turned it off and ran the selects again, and the difference was negligible: 26s instead of 28s running the same query with 4 threads in parallel.

There are many differences between the two operating systems.

For persistence you have the filesysyem, as one example. Are they different? What scheduler are they each using?

For virtualization, you have the VMM as an example. What is the paging algorithm? What read-ahead and write-ahead values? What CPU scheduler is used for both and what are their best uses?

Looking at concurrency, are you getting spin locks, mutual exclusion locks, or anything else?

What about the network and buffers? Caching?

Many aspects and fine details are to be considered and nobody here can legitimately say “this is it.”
Of course I am fully aware that they are different. The question I'm asking myself is why the differences are so massive. Both are as-is stock installations without any optimizations. Both are running on the exact same hypervisor on the exact same physical server and 8 Gbit FC NetApp storage. Both are running the same version of PostgreSQL (10.4). Yet one of them is up to 6x slower than the other when performing real-world selects yet oddly enough just as fast when running pgbench (the difference between FreeBSD and CentOS 7 is less than 1%, sometimes one way sometimes the other; in other words: within margin of error). In my personal opinion that's too much to be attributed to "performance tuning", which usually works its miracles in the vicinity of single to double digit percentages.

The PostgreSQL settings (memory, cache, etc.) are optimized for a 2-core 16GB server, which is what our current virtual production database server is equipped with:

Code:
max_connections = 800
shared_buffers = 4096MB
work_mem = 128MB
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 128
effective_cache_size = 12288
max_parallel_workers_per_gather = 4
max_worker_processes = 16

Are you saying FreeBSD's file system is UFS and the CentOS is XFS?
Correct. FreeBSD has three UFS formatted virtual hard disks, CentOS has three XFS formatted virtual hard disks. In both cases the VM is configured to virtualize an LSI Logic Parallel SCSI-Controller, a VMXNET3 NIC, 4 CPUs, 16 GB of RAM. The only difference is that CentOS 7 is configured as VM Version 8 (ESXi 5.0 and later) while FreeBSD is configured as VM Version 10 (ESXi 5.5 and later).

Never use virtual machines for performance testing.
I understand your point, however what good is a bare metal performance comparison if the operating systems then behave completely different when virtualized? The system I am using for these tests is in my lab used exclusively for this specific task. It's a dual-socket Intel Xeon server with two 10-core 3 GHz E5-2690 v2 CPUs and a total of 384GB of RAM. From my experience it behaves exactly as expected and is very consistent when it comes to performance testing as there are no interferences from the outside world. Plus, I am running those tests sequentially and not in parallel on all three machines at the same time ;)
 
Okay, so you have two different file systems, two different vmWare vm hardware versions and probably two different virtual NICs. I would disqualify this test and report as you really are not scientifically making a true comparison of the two operating systems and data on identical platforms.

Just for kicks, my Percona 5.6 SQL server on FreeBSD 11.2 which is always busy and averages about 60% idle with 6x2TB zraid3 can query about 20 million records in about 24 seconds. Just a 16GB Intel E3-1220.

You need to tweak things!
 
Have you tested io throughput on the machines? If FreeBSD isn't feeding Postgres with disk data fast enough, that might explain why the CPU usage is so low compared to Centos.
 
effective_cache_size = 12288 means 96 Mb if I recall default unit correctly
Whoops, my copy & paste went wrong there. That's supposed to read 12288MB of course (aka 12GB aka 3/4 of the available RAM of 16GB).

Okay, so you have two different file systems, two different vmWare vm hardware versions and probably two different virtual NICs. I would disqualify this test and report as you really are not scientifically making a true comparison of the two operating systems and data on identical platforms.

Just for kicks, my Percona 5.6 SQL server on FreeBSD 11.2 which is always busy and averages about 60% idle with 6x2TB zraid3 can query about 20 million records in about 24 seconds. Just a 16GB Intel E3-1220.

You need to tweak things!
The NICs are 10GBit VMXNET3 on all three systems, and the host is connected to a Nexus switch via 10Gbit multi-mode fibre. Just for fun I created a new CentOS 7 VM with VM Machine version 10 and a FreeBSD VM with VM version 8 and the results were 100% identical. Plus I am not even remotely attempting to make a scientifically sound comparison between CentOS7/RHEL7 and FreeBSD but merely trying to understand why FreeBSD is performing so much worse than CentOS 7. All these differences should only amount to a fraction of what I'm seeing. I have no scientific aspirations, I'm trying to figure out whether or not to port our main production database from PG10 running on Linux to PG10 running on FreeBSD.

I'll probably have to look into general FreeBSD performance tuning I guess. Coming from the Linux world I'm not used to having to tweak everything from scratch, most Linux distros are somewhat properly configured out of the box and don't require too much manual tweaking.

Have you tested io throughput on the machines? If FreeBSD isn't feeding Postgres with disk data fast enough, that might explain why the CPU usage is so low compared to Centos.
No I haven't yet. I assumed that since all three systems are residing on the exact same NetApp fibre-channel storage I/O throughput would be somewhat identical for all three systems but you're right, I'll have to look into that.
 
No, it's like comparing FreeBSD with UFS and Linux with XFS, both of which are the default go-to filesystem for their respective operating system. I am aware that if I was running Ubuntu/Debian instead of RHEL/CentOS I'd be looking at an ext4 file system but again: if that's what I'm encountering in the real world then that's what I'm comparing it with.

Of course I'm not going to compare Linux UFS vs. BSD UFS or Linux XFS vs. BSD XFS because neither of those is realistic in real-world production environments. I am not interested in lab results for the sake of scientifically sound comparability. I am interested in real-world results for a 50GB large hot-standby PostgreSQL production database with 24/7 high-availability requirements. If I want to figure out whether I prefer apples or oranges I'll have to start comparing apples and oranges instead of apples and apples or oranges and slightly bigger oranges ;)

So yes, I am fully aware that the systems differ in various respects. Which is a given, after all I am comparing Linux to FreeBSD here, both running their respective default advised configurations. However, given that the underlying hardware is identical these differences should NOT amount to a performance delta of 6x.
 
what good is a bare metal performance comparison if the operating systems then behave completely different when virtualized?
Because VMs are a software attempt to emulate hardware and software based on the "opinions" of those who write the software in a kernel environment only. There is no perfect emulation. Bare metal is the real thing.
 
Well, I'm going to go out on a limb and say your bottleneck with FreeBSD is using UFS. Go back and use a more robust file system, such as ZFS, and maybe even create 6+ disks on your external storage for it and then setup root on ZFS and watch it rock. Even if your hardware is the same for the VMs, the Filesystems you chose are not. You are putting low octane fuel in FreeBSD's engine - what do you expect? To due a true comparison of performance capabilities, you do need to tweak FreeBSD, because it's more versatile. Keep with Linux if you want the out of the box no system admin needed experience :) Don't Google, read the man! zfs()
 
  • Thanks
Reactions: mj_
The first place I would look is to see if there's anything being bottlenecked. Check and make sure CPU, disk I/O, or something else isn't being pegged. Try testing it on zfs and see if anything changes.

FreeBSD out of the box does have conservative defaults, so you're probably going to have to tweak some things. You may want to read this reddit thread which had some discussion about what to adjust. Database servers aren't my specialty, so make changes carefully.


That said, if this a production database that other people are going to use and manage, I'd reconsider moving it to FreeBSD. Not saying you shouldn't use it, but if the server goes down at 3AM and no one else is familiar with FreeBSD, then it's all going to be on your head.
 
  • Thanks
Reactions: mj_
Did you install vmware tools for FreeBSD? emulators/open-vm-tools-nox11 CentOS 7 does by default. FreeBSD does not.
I did, yes. CentOS does not do so by default either, I had to manually install the VMware tools as well, but they are installed and loaded into memory on both machines.

FreeBSD out of the box does have conservative defaults, so you're probably going to have to tweak some things. You may want to read this reddit thread which had some discussion about what to adjust. Database servers aren't my specialty, so make changes carefully.

Thanks, I've read that thread and it seems like the author of the initial benchmark comparison chimed in, optimized FreeBSD accordingly and got exactly nowhere:

So I ran the benchmark for read-only workload@100 concurrent clients on FreeBSD with updated sysctl variables and I got 88423.67 TPS at 1.131 ms latency. This is almost identical to stock FreeBSD. Do you have any other suggestions ?

The only thing I can still try is a ZFS partition for the database and the WAL Logg files instead of a UFS partition. Otherwise I'm pretty much out of ideas to be honest given that tuning the system does not seem to have any noticeable effects on PostgreSQL performance.

I should probably elaborate what I'm doing as this may not make much sense to some people. Yes, it's true that the database is tiny. However, this is not a small business with one person running things we're talking about here but a medium-sized multinational corporation with several sites in several European countries, all interconnected with each other and each one with one of those databases running on-site as well as a slightly larger central database running in an off-site location. These databases are mostly independent of each other but do share some tables. Fortunately, the sync mechanism is a piece of software written by us in-house back in the PG8 days, meaning that we are completely independent of the underlying database - it would work with everything regardless of operating system or SQL database. There are between 400 and 800 concurrent users for each one of these databases, and we have no downtime allotments whatsoever. Our factories are running 24/7 on 365 days a year. The reason why I'm looking into FreeBSD is that we used to run HP-UX and Ingres pretty much from the late 70s until around 2004 and only switched to Linux and PostgreSQL (SuSE 9.3 and PostgreSQL 8.x) after HP failed to entice is with their current-back-then offerings and their price premium was simply not worth it anymore. Every time we plan a new generation of servers we look at and evaluate our options, and given our UNIX background BSD has always been and always will be an option. There is plenty of UNIX/Linux knowledge in our IT department but at the end of the day it's going to be on my head anyway no matter what.

There is no specific reason why we need to switch from CentOS/RHEL to FreeBSD. There might, however, be legitimate reasons why such a step might make sense to us in the long run. Obviously I have already made sure that our disaster recovery strategies, backup, failover, etc. will be able to cope with this switch. They will require readjustment either way, but given that we either use industry standards (VMware & Veeam) or our own self-developed software (PG backup software ported from Ingres backup software from a time when backups were still stored on massive tapes that had to be rotated daily). We also have a migration strategy in place since it is going to be a massive undertaking anyway - the old servers are not going to get updated but replaced by new ones, and we will be able to take it one step and one site at a time. The task at hand is now to figure out whether over the course of the next 5-10 or even more years it might make sense to invest time, money, and energy into running BSD on certain servers or whether it'd be better to stick with RHEL/CentOS for the time being and reevaluate the situation a few years down the road anew. Personally, I have always admired FreeBSD for its robust release model and long-term support although I must admit that RedHat and SuSE have worked miracles in the Linux realm and SLES and RHEL are the two enterprise distro that do not follow every single fad or flavor of the day and refrain from changing their entire underpinnings and frameworks every time some random Linux developer decides to throw a temper tantrum because of XYZ. At this point I am still in the very early evaluation process and first need to make sure the switch would be doable in the first place. I'm basically still in the proof of concept phase and have not yet gotten past that due to FreeBSD's poor initial performance. And that's exactly what I'm trying to figure out and fix right now :)

Long story short: I have not yet given up on FreeBSD and am going to run some more tests with ZFS. From my understanding it would most likely not make a difference whether I ran the OS itself off a UFS drive and only used ZFS for the database and WAL Logg files drives, correct? Would it make a difference if I created a ZFS pool over 4x 25GB disks instead of a single 1x 100GB disk? The only low-level real-world advantage I can think of right now is that with 4x 25GB I would be able to distribute these four disks on both NetApp storages, thereby increasing redundancy and doubling theoretical I/O throughput on the hardware side (in theory I would go from 1x to 2x 8Gbps). How does the OS handle these situations, does it make a difference software-wise?
 
Just a hunch, since I know a bit about caching in FreeBSD: could you try to dial down the DB max cache to maybe 1/4 of the memory and try again?
 
Replying to an ancient thread...

I think I *might* know what's going on here. You said max_parallel_workers_per_gather = 4 (though the problem occurs with the default of 2 as well). The way PostgreSQL parallelises a scan of a big table is by handing out 8kB blocks to worker processes on demand using an atomic block counter in shared memory, so PID 1 might happen to get blocks 0, 4, 8, 12, ... while PID 2 might get blocks 1, 5, ... and so on (it's non-deterministic). On Linux, there's a read ahead window that defaults to 128kB, so as long as you have fewer than 8 processes doing this type of access pattern, the kernel considers each process to be reading sequentially, and activates its read ahead heuristics, so the kernel starts issuing large reads to prefetch data into the page cache so you can get to decent I/O speeds. On FreeBSD with UFS, I'm wondering if the kernel might be classifying this as random I/O from each worker procerss, and not doing any readahead. One way to investigate whether this is the problem would be to set max_parallel_workers_per_gather = 0, and then see if the performance improves, or at least I/O waits decrease, due to read-ahead heuristics.

This speculation could be wrong, but clearly there is something bad happening here that we should look into. I've added this to a list of FreeBSD/PostgreSQL problems to investigate when round tuits become available. There was already an item on the list to investigate mixed reading and writing at different positions through the same fd, which confuses several OSs' readahead heuristics.


I have no idea for the occasional bad pgbench numbers.
 
Easy. What you don't use can not interfere with results. So do all tests on bare metal, no virtualisation.
 
Back
Top