UFS Optimal block size for SSD used to stage db tables

MMacD · Jul 12, 2017

I need to create and populate many large MariaDB tables in less than geologic time. I've kludged around using memory tables as a buffer, but 32GB of memory just wasn't enough to make it work even when I paged out to disc frequently.

So I broke down and bought a 256GB Samsung 850 PRO ssd (vnand chips) as a staging device and just finished installing it with the aid of wblock's excellent help page at http://www.wonkity.com/~wblock/docs/html/ssd.html

But I'm wondering now whether using the default 32K block size is the best choice. The tables will become large, but they're not always large, and I'd like to waste as little space as possible without slowing down the disc too much.

Any experienced views on an optimal block size for this application?

SirDice · Jul 12, 2017

Although it's for MySQL on ZFS but you typically want to have 128K or 64K record sizes: https://wiki.freebsd.org/ZFSTuningGuide#MySQL

But more generally, it depends on how MySQL/MariaDB has been configured, specifically the record sizes. Note however that this has nothing to do with the actual size of the database, it's going to be the same regardless of the database size.

MMacD · Jul 12, 2017

The individual records in these tables vary greatly in size, from about 64 bytes in one table on up to, in other tables, 2K in the record proper plus a LARGETEXT stored as a separate file with only a pointer in the record proper.

I haven't found any mention in the docs of how MariaDB/MySQL requests disc space, so probably it does the usual thing and lets the o/s decide on a per-write basis. Which, if true, seems to suggest that I should use the smallest (8K) block size. Maybe I'll try the 16K mentioned in the ZFS tuning section you pointed to, and if I don't run out of space then it can stay at that size.

ralphbsz · Jul 12, 2017

For most database implementations, the size of the records doesn't matter. What matter is the size of the database pages or blocks; the database internally organizes records into pages, and reads and writes those. And it's obvious that in most cases performance is optimal if the file system and data base block size match.

The decision which size to read and write is done by the application program (here MariaDB/MySQL), not by the OS, when it calls read() or write(). For most writes to the database itself, it matches the page/block size; writes to logs may be much smaller.

You mentioned above wasting space. Most likely, the choice of block size (which is dozens of kilobytes) will have a negligible effect on the waste of space. To begin with, your disk is hundreds of gigabytes in size, a million times more than the block size. How many files do you have on that disk? Say a few thousand. If each file wastes a partial block, that is still only dozens of megabytes wasted, out of hundreds of gigabytes. Not relevant.

If the documentation doesn't say what the block size is, you could try running MySQL with tracing on for a little bit; it should become very quickly obvious which IO size is most common. Unfortunately, FreeBSD's iostat utility doesn't break down the disk IOs by size; I know tools like that exist for other OSes, but I don't know any for FreeBSD.

sko · Jul 13, 2017

ralphbsz said:
Unfortunately, FreeBSD's iostat utility doesn't break down the disk IOs by size; I know tools like that exist for other OSes, but I don't know any for FreeBSD.

Dtrace is excellent for such tasks. Have a look at this thread, these scripts are a perfect starting point for your specific problem. The diskio.d and diskio2.d scripts might already provide you with everything you want to know.

The FreeBSD wiki provides a quick introduction on how to use DTrace, including some one-liners.

You might also look at some of the scripts in /usr/share/dtrace (disklatency, disklatencycmd) or install the dtrace-toolkit package (or from sysutils/DTraceToolkit) and have a look at these scripts. They are located at /usr/local/share/dtrace-toolkit - most of them were originally written for solaris, so they maybe won't work right out of the box; but they are also a great starting point for extracting one-liners or writing your own scripts.

ralphbsz · Jul 13, 2017

Thanks for the details on Dtrace.

Is there a FreeBSD equivalent to Linux' dstat tool? It allows breaking all IOs (systemwide, not a trace of one process) down by IO size, R versus W, disk, and so on. It's quite powerful, and consequently has a learning curve.

sko · Jul 14, 2017

ralphbsz said:
Is there a FreeBSD equivalent to Linux' dstat tool? It allows breaking all IOs (systemwide, not a trace of one process) down by IO size, R versus W, disk, and so on. It's quite powerful, and consequently has a learning curve.

Again: DTrace.
In fact, the "iosnoop"[1] script from the dtrace-toolkit does exactly what you outlined - without being a seperate tool/binary but only a small dtrace script that can be very easily modified if needed.

http://www.brendangregg.com/DTrace/iosnoop_example.html

UFS Optimal block size for SSD used to stage db tables

MMacD

SirDice

Administrator

MMacD

ralphbsz

sko

ralphbsz

sko