ZFS Max Reads : 3x1TB + 1xSSD L2ARC

Hey guys,

As the title says, I will be using 3 x 1TB drives and one SSD for L2ARC. I'm looking to get the fastest read throughput. I'm a bit confused obviously, can you mirror the 1TB data on all 3 drives? So in the end, your system only has 1 TB of storage capacity instead of 3TB?

Does this involve striping or do I bypass striping altogether? My reasoning is that if I'm doing file serving and the same data is redundantly stored on all 3 drives, the IOPS should be 3 times as much.

Also take note that I am not really worried about data backup, I've got that covered, so any data redundancy is purely for performance reasons.
 
einthusan said:
As the title says, I will be using 3 x 1TB drives and one SSD for L2ARC. I'm looking to get the fastest read throughput. I'm a bit confused obviously, can you mirror the 1TB data on all 3 drives? So in the end, your system only has 1 TB of storage capacity instead of 3TB?

Yes, you can create a 3-way mirror, where the data is the same on each disk, giving you 1 TB of usable storage, with the ability to lose 2 disks without losing any data:
Code:
# zpool create poolname mirror disk1 disk2 disk3

Also take note that I am not really worried about data backup, I've got that covered, so any data redundancy is purely for performance reasons.

Redundancy doesn't give you performance. :) If you want the absolute best performance, then just create a pool of individual disks:
Code:
# zpool create poolname disk1 disk2 disk3
That will create the equivalent of a RAID0 stripe across the three drives, give you 3 TB of disk space, and the most IOps. Of course, lose any 1 drive, and the whole pool is gone. And you lose the ability to repair errors in any data, as there is no redundancy in the pool.
 
A mirror can give some performance increase in reads depending on the mirror algorithm, but writes suffer. I tested this recently with two 80G IDE drives:

Code:
           write  read
lone drive 37608  55994
gstripe    26945  78086
gmirror    13460  71698

That's with the gmirror(8) load algorithm.
 
I had always thought that mirroring performed better than striping. Your tests indicates that striping performs better in reads as well.

This articles concludes by saying that mirrors are always faster than RAID-Z groups for file serving.
http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance

Am I understanding correctly that a RAID-Z group is the same as adding raw drives into a ZFS pool? This is surely some confusing stuff. If I had a spare machine I would be able to run some tests myself but I don't have one.
 
No. A raidz1 vdev is similar to a RAID5 array in that out of 'n' disks, you have 'n-1' data disks and 1 parity disk. And a raidz2 vdev is like a RAID6 array, where you have 'n-2' data disks and 2 parity disks.
 
wblock@ said:
A mirror can give some performance increase in reads depending on the mirror algorithm, but writes suffer. I tested this recently with two 80G IDE drives:

Code:
           write  read
lone drive 37608  55994
gstripe    26945  78086
gmirror    13460  71698

That's with the gmirror(8) load algorithm.

It depends mostly on the number of datadrives. Testing a stripe of three disks vs three mirrored pairs, I can't really tell the difference.

Code:
             -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
r0-3x1   16384 30372 10.5 24044  2.8 22521  3.2 224798 63.4 218721  8.2 490.2  0.6
r1-6x3   16384 30104  7.4 23427  2.7 21700  3.1 237144 74.8 221707  7.6 468.3  0.6
 
wblock@ said:
A mirror can give some performance increase in reads depending on the mirror algorithm, but writes suffer. I tested this recently with two 80G IDE drives:

Code:
           write  read
lone drive 37608  55994
gstripe    26945  78086
gmirror    13460  71698

That's with the gmirror(8) load algorithm.

Make sure that those drives not share one IDE port. Small numbers on mirror write are suspicious. In general, if no other limitations and the benchmark is multi-threaded, gstripe should give x2 performance on both read and write, while gmirror -- x2 on read and x1 on write.
 
They were on different ports on an old Promise PCI IDE controller. There may be bottlenecks on the card, but it was the only way other than IDE/USB converters to attach these to a recent motherboard. I figured the mirror write slowdown was due to the mirror having up to twice the rotational latency of a lone drive.
 
wblock@ said:
I figured the mirror write slowdown was due to the mirror having up to twice the rotational latency of a lone drive.

You are right about latency, just not twice, but I think on average x1.5. But read-ahead/write-back of the file system should hide it.
 
l2arc writes more than reading

After striping together three disks and adding an L2ARC device, I enabled caching of streaming data and let the L2ARC warm up for ten hours. The read rates from L2ARC are lower than those of disk reads. L2ARC keeps writing 40 MB/sec and reads only at 20 MB/sec. When I tested the SSD device using Bonnie++, the throughput was amazingly high. However, under real-world streaming load, it's as if the L2ARC wants to keep on caching disk reads instead of helping to improve overall read throughput. Any advice/tips would be much appreciated!
 
First make sure you have
Code:
vfs.zfs.l2arc_noprefetch=0
set in /etc/sysctl.conf. Otherwise, set it on the command line

# sysctl vfs.zfs.l2arc_noprefetch=0

Next install sysutils/zfs-stats and run zstat. It will show the efficiencies of ARC, L2ARC and ZFETCH. Write them down for future references.

Now comes the hard part. Determine what is the size of your working set. Then make sure it is less than the capacity of your L2ARC. This can be done in two ways. Add more cache drives. Or put the files into different filesystems and only set secondarycache=all for files that are you want to cache. Rerun zstat after each change and see what the improvement would be, if any.
 
t1066 said:
First make sure you have
Code:
vfs.zfs.l2arc_noprefetch=0
set in /etc/sysctl.conf. Otherwise, set it on the command line
# sysctl vfs.zfs.l2arc_noprefetch=0

Yes, I had this value set.

Obviously the L2ARC wasn't working as I expected since it seems to be in degraded mode. I'll try to read up on this more. Thanks.

Code:
L2 ARC Summary: (DEGRADED)
	Passed Headroom:			2.16m
	Tried Lock Failures:			28.29k
	IO In Progress:				178
	Low Memory Aborts:			3
	Free on Write:				117.27k
	Writes While Full:			40.71k
	R/W Clashes:				91
	Bad Checksums:				64
	IO Errors:				0
	SPA Mismatch:				0

L2 ARC Size: (Adaptive)				29.78	GiB
	Header Size:			0.15%	45.87	MiB

L2 ARC Evicts:
	Lock Retries:				213
	Upon Reading:				391

L2 ARC Breakdown:				29.18m
	Hit Ratio:			24.30%	7.09m
	Miss Ratio:			75.70%	22.09m
	Feeds:					111.90k

L2 ARC Buffer:
	Bytes Scanned:				45.67	TiB
	Buffer Iterations:			111.90k
	List Iterations:			6.63m
	NULL List Iterations:			1.04m

L2 ARC Writes:
	Writes Sent:			100.00%	86.82k

Code:
  pool: pool1
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Fri May 18 11:08:59 2012
config:

	NAME          STATE     READ WRITE CKSUM
	pool1         ONLINE       0     0     0
	  mirror-0    ONLINE       0     0     0
	    ada0p2    ONLINE       0     0     0
	    ada2p2    ONLINE       0     0     0
	  gpt/disk1   ONLINE       0     0     0
	cache
	  gpt/cache1  ONLINE       0     0     0

errors: No known data errors
 
t1066 said:
Next install sysutils/zfs-stats and run zstat. It will show the efficiencies of ARC, L2ARC and ZFETCH. Write them down for future references.

I don't see L2ARC :S
Code:
ZFS real-time cache activity monitor

Cache efficiency percentage:
                  10s    60s    tot
          ARC:  68.79  70.48  70.48
       ZFETCH:  96.88  96.94  96.94
VDEV prefetch:   0.00   0.00   0.00
 
Got it to work. Made the changes you suggested. Does this look okay?
Code:
ZFS real-time cache activity monitor

Cache efficiency percentage:
           10s    60s    tot
   ARC:  76.44  78.37  80.52
 L2ARC:  12.12  17.14  15.28
ZFETCH:  98.56  98.61  98.77
 
Your L2ARC has 64 bad checksum, which is why it is classified as DEGRADED.

The efficiency of your L2ARC is less than 20%, which is pretty bad unless it is just warming up. I would try to get the efficiency up to at least 70%. And ideally, it should be over 90% most of the time. You should try to improve the whole setup by monitoring the size of L2ARC and the efficiency. If you fill up the cache drive but still get low efficiency, you had to either add more cache drives or restrict caching to certain filesystems.
 
Back
Top