I am stopping short of calling it performance degradation but that is also quite probable.
I have a large set of FreeBSD nodes and I am migrating them from 11.2 to 12.1 as well as migrating from AWS t2.medium instances to similar t3.medium instances. I am trying to account for a FreeBSD performance delta.
Reference System:
System under Test
Other system of interest - ultimate target
As per graphs:
The servers are all slaves off the the same master, their entire purpose is to read binlogs and write changes to disk.
My suspicion is something related to the Meltdown/Spectre mitigations (hence the pmap.pti change).Note that this is running on an isolated server - no external access. I can disable the Meltdown/Spectre mitigations without too much concern.
On all nodes,
Questions are
I have a large set of FreeBSD nodes and I am migrating them from 11.2 to 12.1 as well as migrating from AWS t2.medium instances to similar t3.medium instances. I am trying to account for a FreeBSD performance delta.
Reference System:
- Freebsd 11.2
- Running on AWS t2.medium instance (2 virtual CPUs, 4GB RAM)
- Running mariadb 10.2, as a slave
- Average CPU usage: 24% user, 9% system, 1% interrupt (see graph)
vm.pmap.pti=1
System under Test
- Freebsd 12.1
- Running on AWS t2.medium instance (2 virtual CPUs, 4GB RAM)
- Running mariadb 10.2, as a slave, parallel to reference system
- Average CPU usage: 54% user, 50% system, 1.5% interrupt (see graph)
- 1st half of the graph with default vm.pmap.pti, latter part with
vm.pmap.pti=0
Other system of interest - ultimate target
- Freebsd 12.1
- Running on AWS t3.medium instance (2 virtual CPUs, 4GB RAM)
- Running mariadb 10.2, as a slave, parallel to reference system
- Average CPU usage: 28% user, 51% system, 4% interrupt (see graph)
- 1st half of the graph with default vm.pmap.pti, latter part with
vm.pmap.pti=0
As per graphs:
- System activity increases dramatically from 11.2 to 12.1
- The difference is partly mitigated on a t2.medium using vm.pmap.pti=0, but remains substantial
- Importantly - it still does the work. Metrics at the app level (internal mysql metrics) show that the node does track master, even slightly better than the reference system. Maybe it just looks busy?
The servers are all slaves off the the same master, their entire purpose is to read binlogs and write changes to disk.
My suspicion is something related to the Meltdown/Spectre mitigations (hence the pmap.pti change).Note that this is running on an isolated server - no external access. I can disable the Meltdown/Spectre mitigations without too much concern.
On all nodes,
hw.ibrs_disable=1
hw.mds_disable=0
Questions are
- Why is cpu.system so busy in 12.1 compared to 11.2?
- Is there a security patch I can disable to recover that CPU time?
- What's the best way to track this? Anyone else experienced something similar?