ZFS Trying to solve slow I/O for a ZFS dataset

I have a ZFS mirror of 2 4TB HDDs with 4KiB block sectors on a FreeBSD 11.1 box (OS on an SSD). On that zpool (iocage) I have a multitude of jails. I noticed over time I was getting very poor I/O performance for MariaDB, particularly with writes. My application using Maria is Nextcloud. I put MariaDB in its own zfs dataset inside the jail so I could change its primary cache to just be metadata only since I read that is supposed to be handled fine by the database itself, and it's just redundant at that point. I proceeded to run ioping on that dataset, and also a directory in the dataset for the root of the jail, and found there was a stark difference in performance.

For iocage/jails/nextcloud/root:
68.4 k requests completed in 2.99 s, 16.7 GiB read, 22.9 k iops, 5.58 GiB/s
generated 68.4 k requests in 3.00 s, 16.7 GiB, 22.8 k iops, 5.56 GiB/s
min/avg/max/mdev = 39.5 us / 43.7 us / 283.0 us / 6.16 us

For iocage/jails/nextcloud/mysql:
1.66 k requests completed in 3.00 s, 415.5 MiB read, 554 iops, 138.5 MiB/s
generated 1.66 k requests in 3.00 s, 415.8 MiB, 553 iops, 138.5 MiB/s
min/avg/max/mdev = 39.3 us / 1.81 ms / 326.3 ms / 11.0 ms

As you can see, the IO is VERY low in comparison, despite being on the same disks. Doing a ZFS get all, the only differences in properties are the mysql dataset has lz4 compression enabled and the primarycache to metadata. Will these changes that dramatically hamper performance or is there a deeper issue here? My slow db writes happened even before I put its contents in a different dataset
 
You say that ZFS file system is on two 4TB disks, and by that statement you mean "spinning" disks, not SSDs. The two disks are mirrored, so for writes they will have the same performance as a single disk (since every write operation has to be performed twice, once on each of the disks). For read operations the performance of the disks might be twice as high as that of a single disk, since data only needs to read from one mirror copy, and the file system can alternate between the two disks.

The hardware of a normal disk drive is capable of about 150-200 MByte/s sequential throughput, and of about 100-200 seeks per second (which means this many random IOs per second). The ioping tests you performend seem to be read, not write. The performance you measured for your root data set is way better than that, by orders of magnitude: 6 GByte/s throughput instead of the expected 300-400 MByte/s, and 20K IOps instead of the expected 200-400. That means your root result is not bottlenecked by the disk drive, and instead you probably measured the performance of ZFS's memory cache.

On the mysql test, your measured performance is close to what would be expected of a disk drive, but still a little bit high. I suspect that this test shows a result that is a mix of raw disk and memory cache performance. It could also be that the cache for the root test was pre-warmed by other operations.

The question you need to ask yourself is: What is the working set size of your application? How much will caching influence the real-world performance? Remember that caching has different effects on reads and writes. And most importantly: what performance do you actually need? If you follow SirDice's advice (turn off caching means trade some space for performance, and make sure the block/track/record/page sizes are adjusted to match), you'll probably quickly get to the hardware limits of your machine (which could be the disk drive, the CPU power, or the memory interfaces).
 
Thank you for the suggestions. I wasn't sure about having compression enabled/disabled on the dataset my mysql server writes the db too, lots of information from folks saying enable/disable compression. I will also do some more of the tuning features. I have some of those enabled but not all (along with the couple options set in the db itself). I didn't show a write benchmark yet, I wanted to tackle the reads first, I think your clarification makes sense, I could see that it's being cached, which kinda bums me then because that ruins the benchmark. I'll make those dataset tweaks tonight. When turning compression off, since existing data is already compressed, I'll have to create a new dataset and move my db files over there correct?
 
I fear you'll have to move the data after turning compression off. But I'll add a more philosophical comment: Caching doesn't ruin the benchmark. Caching is real, it is an important part of file systems. If you are trying to do micro-benchmark measurements of your file system with caching turned off, the result might be easy to understand, but it will not reflect the performance that you will actually see. Your first test should be testing performance with your real application (real data, real working set size, both reads and writes), and see what the performance is, and how it depends on setting parameters (like turning compression on/off). Only after you have that real-world baseline does it make sense to drill deeper with targeted micro-benchmarks that test individual components of the system, or individual aspects of the workload (like caching off, or reads only), to help you find the bottlenecks.
 
I get what you mean. I suppose in that sense it does bring to light then that I may be having performance issues with my database because it isn't being cached then? I'm wondering if I'm hitting a CPU bottleneck instead of a disk bottleneck now (correct me if I'm wrong here). Doing a write of one large file to my disk is accomplished with flying colors.

zpool iostat for single large file:

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
iocage 371G 3.26T 0 375 0 30.1M
iocage 371G 3.26T 0 107 0 9.75M
iocage 371G 3.26T 0 264 0 20.2M
iocage 371G 3.26T 0 363 0 29.3M

with CPU usage of about 10%, mysql barely uses 1% CPU

zpool iosat for many small files (most files being X KB in size, files ranging from few hundred bytes to few megabytes):

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
iocage 370G 3.26T 3 53 159K 512K
iocage 370G 3.26T 3 298 28.0K 5.60M
iocage 370G 3.26T 0 0 0 0
iocage 370G 3.26T 1 246 16.0K 5.19M
iocage 370G 3.26T 0 0 0 0
iocage 370G 3.26T 0 0 0 0
iocage 370G 3.26T 3 241 28.0K 5.13M


Here WCPU usage goes through the roof with php-fpm and my redis caching server gets a workout:
php-fpm - continually about 75% WCPU usage (knocks back n forth with multiple threads)
redis-server - about 15%
mysql - about 2-3% cpu usage

So you can see over time with many small files, there are decent quantities being written in chunks. Mind you, those numbers are a bit deceiving for the many small files. At best, I had 2MB of data written in that time and maybe a whopping total of 10 folders and 30 files. This is an 8 year old Xeon
 
I suppose in that sense it does bring to light then that I may be having performance issues with my database because it isn't being cached then?
Caching can have a big impact on database performance. No cache, too small or too large a cache are all bad for performance. The trick is to find the "sweet spot" for your situation. This typically involves constant monitoring and adjustments. I always like to use tuning scripts, they make life a little easier. Never take any of the suggestions of those scripts at face value, always try to understand what it does and why it would need to be changed.

databases/mysqltuner # Nice script, will give you some good suggestions.
databases/tuning-primer # Old script, used to work fine up to MySQL 5.5. Needs a bit of TLC to make it work on >5.5 or MariaDB but can provide some good info.
 
Back
Top