I have a 10.0 system on which I have MySQL running with its InnoDB database files stored on ZFS. It seemed to have been going fine for months, but lately, and seemingly suddenly, performance has bottomed out. After debugging a while surrounding MySQL, I finally decided to just see if just reading the database files from the filesystem was slow.
I picked one table whose database file was about 16GB, ran:
and got:
In comparison, a copy of the file (with some changes, I'm sure: the database is live) living on UFS on the same system gives me:
Here are non-default ZFS options on this system:
/boot/loader.conf:
/etc/sysctl.conf:
There are 12 snapshots on this filesystem right now. (Which seems excessive; I'm going to see if any of them can be deleted.)
There's no L2ARC for this pool (or any other pool on the system).
I've tried all three values for primarycache, and I've tried re-enabling the prefetch, none of which seem to have had any significant effect.
The zpool is four 2-disk mirrors:
One oddity is how the raw devices are set up. The storage controller is an MFI controller, and each raw disk is actually configured in the MFI controller as a 1-disk RAID0 volume:
Anyone have any ideas where to look?
I picked one table whose database file was about 16GB, ran:
time cat table.ibd > /dev/null
and got:
Code:
cat table.ibd > /dev/null
1.24s user 64.35s system 0% cpu 1:00:34.65 total
In comparison, a copy of the file (with some changes, I'm sure: the database is live) living on UFS on the same system gives me:
Code:
cat table.ibd > /dev/null
0.20s user 9.34s system 5% cpu 9.550 total
Here are non-default ZFS options on this system:
/boot/loader.conf:
Code:
vfs.zfs.arc_max=17179869184
/etc/sysctl.conf:
Code:
vfs.zfs.prefetch_disable=1
zfs get all
:
Code:
recordsize 16K
compression on
atime off
primarycache metadata
zfs:zfs_nocacheflush 1
There are 12 snapshots on this filesystem right now. (Which seems excessive; I'm going to see if any of them can be deleted.)
There's no L2ARC for this pool (or any other pool on the system).
I've tried all three values for primarycache, and I've tried re-enabling the prefetch, none of which seem to have had any significant effect.
The zpool is four 2-disk mirrors:
% zpool status mysqlrot
Code:
pool: mysqlrot
state: ONLINE
scan: scrub repaired 0 in 9h45m with 0 errors on Fri Jun 26 12:46:33 2015
config:
NAME STATE READ WRITE CKSUM
mysqlrot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mfid9 ONLINE 0 0 0
mfid10 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
mfid11 ONLINE 0 0 0
mfid12 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
mfid13 ONLINE 0 0 0
mfid14 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
mfid15 ONLINE 0 0 0
mfid16 ONLINE 0 0 0
spares
mfid19 AVAIL
errors: No known data errors
One oddity is how the raw devices are set up. The storage controller is an MFI controller, and each raw disk is actually configured in the MFI controller as a 1-disk RAID0 volume:
Code:
% sudo mfiutil show volumes
mfi0 Volumes:
Id Size Level Stripe State Cache Name
mfid0 ( 185G) RAID-1 64K OPTIMAL Disabled <OS>
mfid1 ( 558G) RAID-0 64K OPTIMAL Disabled <DB0A>
mfid2 ( 558G) RAID-0 64K OPTIMAL Disabled <DB0B>
mfid3 ( 558G) RAID-0 64K OPTIMAL Disabled <DB1A>
mfid4 ( 558G) RAID-0 64K OPTIMAL Disabled <DB1B>
mfid5 ( 558G) RAID-0 64K OPTIMAL Disabled <DB2A>
mfid6 ( 558G) RAID-0 64K OPTIMAL Disabled <DB2B>
mfid7 ( 558G) RAID-0 64K OPTIMAL Disabled <DB3A>
mfid8 ( 558G) RAID-0 64K OPTIMAL Disabled <DB3B>
mfid9 ( 558G) RAID-0 64K OPTIMAL Disabled <DB4A>
mfid10 ( 558G) RAID-0 64K OPTIMAL Disabled <DB4B>
mfid11 ( 558G) RAID-0 64K OPTIMAL Disabled <DB5A>
mfid12 ( 558G) RAID-0 64K OPTIMAL Disabled <DB5B>
mfid13 ( 558G) RAID-0 64K OPTIMAL Disabled <DB6A>
mfid14 ( 558G) RAID-0 64K OPTIMAL Disabled <DB6B>
mfid15 ( 558G) RAID-0 64K OPTIMAL Disabled <DB7A>
mfid16 ( 558G) RAID-0 64K OPTIMAL Disabled <DB7B>
mfid17 ( 558G) RAID-0 64K OPTIMAL Disabled <DB8A>
mfid18 ( 558G) RAID-0 64K OPTIMAL Disabled <DB8B>
mfid19 ( 558G) RAID-0 64K OPTIMAL Disabled <SPARE0>
Anyone have any ideas where to look?