I’m in the middle of the migration from bare metal to Google Cloud.
The first serious issue I encountered was the abysmal performance of certain PostgreSQL queries. Let me save you from reading how many hours and how much resources I wasted on finding the root of the problem.
Yes, the first one is 100 hundred times slower. This is nothing to do with the disc cache. It’s the timer.
It’s true that not all queries are affected. The explain analyze adds profiling overhead to the execution. That’s how I figured out the issue was the timer. I wish I had noticed earlier that the system clock was screwed up too.
As you might have guessed it wasn’t me who set the timecounter to ACPI-fast. I have little idea what timecounter is. It’s in the "official" FreeBSD installation on Google Cloud Compute.
https://cloud.google.com/compute/docs/images
https://forums.freebsd.org/threads/56664/
I’ve created my own version since the image is running on UFS and I wanted ZFS root. However, the system itself was copied from the image, as the image has certain daemons and packages from Google that might be better running.
I’m wondering on:
1. Am I the only one who is trying to use FreeBSD on Google Cloud in production? It sound scary. Okay, not everyone will install PostgreSQL, but sooner or later the screwed up system clock should be noticed. It might also explain why I had much higher load on another two instances than I expected, even without PostgreSQL (exim, ruby daemons).
2. Is the FreeBSD Team who is creating the images or is it Google? I have no idea where to report this issue.
3. Is the TSC-low a safe setting? I see that’s the default value on my bare metal servers, that’s why I tried it.
I got a bit discouraged. One of the main reasons of the move is that Google offers transparent encryption for my data.
(Yes, I have to trust them. The other option would be trusting the whole world. I don’t do that.)
But now I have the impression that there can’t be many people who are running FreeBSD on Google Cloud. Which means little if any resources when I encounter the next issue. Why it is so?
The first serious issue I encountered was the abysmal performance of certain PostgreSQL queries. Let me save you from reading how many hours and how much resources I wasted on finding the root of the problem.
Code:
root@xxx ~ # psql -U mage xxx -c 'explain analyze select count(1) from messages'
QUERY PLAN
----------------
Aggregate (cost=81024.35..81024.35 rows=1 width=0) (actual time=30578.241..30578.248 rows=1 loops=1)
-> Index Only Scan using xxx
Heap Fetches: 0
Planning time: 64.572 ms
Execution time: 30578.507 ms
(5 rows)
root@xxx ~ # sysctl kern.timecounter.hardware
kern.timecounter.hardware: ACPI-fast
root@xxx ~ # sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: ACPI-fast -> TSC-low
root@xxx ~ # psql -U mage xxx -c 'explain analyze select count(1) from messages'
QUERY PLAN
--------------------
Aggregate (cost=81024.35..81024.35 rows=1 width=0) (actual time=374.444..374.444 rows=1 loops=1)
-> Index Only Scan using xxx
Heap Fetches: 0
Planning time: 0.529 ms
Execution time: 374.539 ms
(5 rows)
Yes, the first one is 100 hundred times slower. This is nothing to do with the disc cache. It’s the timer.
It’s true that not all queries are affected. The explain analyze adds profiling overhead to the execution. That’s how I figured out the issue was the timer. I wish I had noticed earlier that the system clock was screwed up too.
As you might have guessed it wasn’t me who set the timecounter to ACPI-fast. I have little idea what timecounter is. It’s in the "official" FreeBSD installation on Google Cloud Compute.
https://cloud.google.com/compute/docs/images
https://forums.freebsd.org/threads/56664/
I’ve created my own version since the image is running on UFS and I wanted ZFS root. However, the system itself was copied from the image, as the image has certain daemons and packages from Google that might be better running.
I’m wondering on:
1. Am I the only one who is trying to use FreeBSD on Google Cloud in production? It sound scary. Okay, not everyone will install PostgreSQL, but sooner or later the screwed up system clock should be noticed. It might also explain why I had much higher load on another two instances than I expected, even without PostgreSQL (exim, ruby daemons).
2. Is the FreeBSD Team who is creating the images or is it Google? I have no idea where to report this issue.
3. Is the TSC-low a safe setting? I see that’s the default value on my bare metal servers, that’s why I tried it.
I got a bit discouraged. One of the main reasons of the move is that Google offers transparent encryption for my data.
(Yes, I have to trust them. The other option would be trusting the whole world. I don’t do that.)
But now I have the impression that there can’t be many people who are running FreeBSD on Google Cloud. Which means little if any resources when I encounter the next issue. Why it is so?