In the last few weeks I've been toying around with the idea of porting our main production database servers from Linux to FreeBSD. Our current generation is running RHEL6/PG9, and the new generation is going to be either CentOS 7 and PostgreSQL 10.5 or FreeBSD and PostgreSQL 10.5. In order to run some tests, I've installed two virtual machines from scratch, both running on the exact same VMware host and storage in a lab and thus unaffected by other VMs or day-to-day load. Both VMs have four cores each (Intel Xeon E5-2690 v2), 16GB of RAM, and are saved on an unmirrored NetApp storage volume to eliminate extra I/O latencies. Thus, I haven't configured a ZFS raid but configured three separate UFS disks: 40GB for the OS, 100GB for the database, 100GB for the WAL-Logg files). On the CentOS 7 server, all three disks are XFS. The two operating systems in question are the latest CentOS 7.5.1804 and FreeBSD 11.2-RELEASE r335510.
After initializing the database I took a dump created with pg_dumpall and restored it. Both databases are working as expected and I can access both from the local machine as well as our development application server running on the same VMware host. And here's where it gets interesting: PG10 on FreeBSD is so dead slow that it's painful. Not when running pgbench, which returns almost identical values for all three systems most of the time (I also have a RHEL6/9.2 clone running for comparison), but when running large queries on our database. For example, a simple "select count(*) from table" on a table with several million records takes 12s on RHEL6/PG9, less than 10s on RHEL7/PG10 but anywhere between 30s and 90s, with runtime hovering around the 60s mark most of the time, on FreeBSD11.2/PG10. Occasionally, even pgbench fails to run properly and returns less than 150 tps instead of the expected 1,300+ tps. Running htop in a second window I can see PG10 forks its four worker processes as expected and configured in postgresql.conf, however it fails to cause any significant amount of CPU load. Running above select on CentOS 7/PG10 causes a load of 30-40% on all four CPU cores whereas running the same select on FreeBSD11.2/PG10 causes the CPUs to practically idle around 5-10%.
I've googled around and found some very very old tutorials on performance-tuning FreeBSD for PostgreSQL. The PostgreSQL settings between the three systems are absolutely identical, and the database contents are 100% identical (restored all three from the exact same dumpfile). The underlying hardware is absolutely identical. And still I can't wrap my head around this - something must clearly be going wrong here. Being new to FreeBSD I am a bit flabbergasted and lost, any hints where I could start investigating this issue?
After initializing the database I took a dump created with pg_dumpall and restored it. Both databases are working as expected and I can access both from the local machine as well as our development application server running on the same VMware host. And here's where it gets interesting: PG10 on FreeBSD is so dead slow that it's painful. Not when running pgbench, which returns almost identical values for all three systems most of the time (I also have a RHEL6/9.2 clone running for comparison), but when running large queries on our database. For example, a simple "select count(*) from table" on a table with several million records takes 12s on RHEL6/PG9, less than 10s on RHEL7/PG10 but anywhere between 30s and 90s, with runtime hovering around the 60s mark most of the time, on FreeBSD11.2/PG10. Occasionally, even pgbench fails to run properly and returns less than 150 tps instead of the expected 1,300+ tps. Running htop in a second window I can see PG10 forks its four worker processes as expected and configured in postgresql.conf, however it fails to cause any significant amount of CPU load. Running above select on CentOS 7/PG10 causes a load of 30-40% on all four CPU cores whereas running the same select on FreeBSD11.2/PG10 causes the CPUs to practically idle around 5-10%.
I've googled around and found some very very old tutorials on performance-tuning FreeBSD for PostgreSQL. The PostgreSQL settings between the three systems are absolutely identical, and the database contents are 100% identical (restored all three from the exact same dumpfile). The underlying hardware is absolutely identical. And still I can't wrap my head around this - something must clearly be going wrong here. Being new to FreeBSD I am a bit flabbergasted and lost, any hints where I could start investigating this issue?