ZFS performance issue

I have this 12 SATA disks zpool V28 (FreeBSD 9.0-release) (6 mirror vdevs, ashift=12) + 1 32 GB SSD as cache.
It performs excellent with blocksizes >= 128k (default zfs recordsize).
Eg. # dd if=160Gb_huge_file of=/dev/null ibs=128k obs=128k gets me 1.6GB/s transfer rate but
# dd if=160Gb_huge_file of=/dev/null ibs=4k obs=128k gets me 10MB/s :(
Even worse, a # cat 160Gb_huge_file >>/dev/null takes ages at 2MB/s.
The zfs iostat is mostly silent on disks when executing the above commands, which means that ZFS is using it's in-memory cache to get the next-sector data, but even so performance drops hard.
When I try to copy the 160Gb_huge_file on the same filesystem using mc I get 60MB/s but with
# dd if=160Gb_huge_file of=copy_of_huge_file ibs=128k obs=128kk it goes up to 300MB/s, same as regular cp.
I wish the applications could use the full speed of ZFS underneath.

Is there a sysctl which could help ZFS with this performance penalty?

Or is just badly written apps' fault (cat, mc, iscsi_tgt ...)?
 
So it´s reading 4k blocks that is "hard", as you experience it. Have you tried reading 4k blocks out from the SSD directly, like:
# dd if=/dev/thessd(or /dev/gpt/thessd) of=/dev/null ibs=4k obs=128k

To compare wether it´s ZFS´s "fault" or the SSD that is "slow"?

How is the partitioning done on the SSD? Some devices perform badly when not properly partitioned. Most notedly the OCZ drives I have tested performes twice as good when partitioned aligned to either 4k or 1MiB.

And also you can enable prefetching from L2ARC if you:
# sysctl vfs.zfs.l2arc_noprefetch=0

That may speed up reading from L2ARC.

/Sebulon
 
SSD is partitioned on 4k boundaries. I don't think the SSD is the bottleneck.
I assumed that when zfs receives a request to open and read from a file, it reads recordsize (128k) because of checksumming. Whether I want to read 1 byte, 4k , or any other size less than 128k, the subsequent reads should come from the 128k buffer and not from the disks. zfs iostat confirms this assumption.
Of course, when reading 128k in 4k chunks, there are 32 calls on the same read function.
The data being already in memory buffer I hoped the overhead would be minimal (some memcpy).

I'll check your suggestions on Monday.
Thank you for the hints!
 
OK,

yeah then the SSD shouldn´t be the bottleneck, but it´s always a good starting point to, at least, begin ruling stuff out.

I assumed that when zfs receives a request to open and read from a file, it reads recordsize (128k) because of checksumming. Whether I want to read 1 byte, 4k , or any other size less than 128k, the subsequent reads should come from the 128k buffer and not from the disks. zfs iostat confirms this assumption.
Cool, I didn´t know that. So if the system only wants to read out 4k, then 124k can served up from ARC instantly. This, together with prefetching must offload the "slow" disks a whole lot. ZFS is a smart mother;)

/Sebulon
 
Back
Top