1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

ZFS Max Reads : 3x1TB + 1xSSD L2ARC

Discussion in 'Storage' started by einthusan, May 3, 2012.

  1. einthusan

    einthusan New Member

    Messages:
    87
    Likes Received:
    0
    Hey guys,

    As the title says, I will be using 3 x 1TB drives and one SSD for L2ARC. I'm looking to get the fastest read throughput. I'm a bit confused obviously, can you mirror the 1TB data on all 3 drives? So in the end, your system only has 1 TB of storage capacity instead of 3TB?

    Does this involve striping or do I bypass striping altogether? My reasoning is that if I'm doing file serving and the same data is redundantly stored on all 3 drives, the IOPS should be 3 times as much.

    Also take note that I am not really worried about data backup, I've got that covered, so any data redundancy is purely for performance reasons.
     
  2. phoenix

    phoenix Moderator Staff Member Moderator

    Messages:
    3,409
    Likes Received:
    2
    Yes, you can create a 3-way mirror, where the data is the same on each disk, giving you 1 TB of usable storage, with the ability to lose 2 disks without losing any data:
    Code:
    # zpool create poolname mirror disk1 disk2 disk3
    Redundancy doesn't give you performance. :) If you want the absolute best performance, then just create a pool of individual disks:
    Code:
    # zpool create poolname disk1 disk2 disk3
    That will create the equivalent of a RAID0 stripe across the three drives, give you 3 TB of disk space, and the most IOps. Of course, lose any 1 drive, and the whole pool is gone. And you lose the ability to repair errors in any data, as there is no redundancy in the pool.
     
  3. wblock@

    wblock@ Administrator Staff Member Administrator Moderator Developer

    Messages:
    11,252
    Likes Received:
    25
    A mirror can give some performance increase in reads depending on the mirror algorithm, but writes suffer. I tested this recently with two 80G IDE drives:

    Code:
               write  read
    lone drive 37608  55994
    gstripe    26945  78086
    gmirror    13460  71698
    
    That's with the gmirror(8) load algorithm.
     
  4. einthusan

    einthusan New Member

    Messages:
    87
    Likes Received:
    0
    I had always thought that mirroring performed better than striping. Your tests indicates that striping performs better in reads as well.

    This articles concludes by saying that mirrors are always faster than RAID-Z groups for file serving.
    http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance

    Am I understanding correctly that a RAID-Z group is the same as adding raw drives into a ZFS pool? This is surely some confusing stuff. If I had a spare machine I would be able to run some tests myself but I don't have one.
     
  5. phoenix

    phoenix Moderator Staff Member Moderator

    Messages:
    3,409
    Likes Received:
    2
    No. A raidz1 vdev is similar to a RAID5 array in that out of 'n' disks, you have 'n-1' data disks and 1 parity disk. And a raidz2 vdev is like a RAID6 array, where you have 'n-2' data disks and 2 parity disks.
     
  6. jalla

    jalla New Member

    Messages:
    374
    Likes Received:
    0
    It depends mostly on the number of datadrives. Testing a stripe of three disks vs three mirrored pairs, I can't really tell the difference.

    Code:
                 -------Sequential Output-------- ---Sequential Input-- --Random--
                  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
    r0-3x1   16384 30372 10.5 24044  2.8 22521  3.2 224798 63.4 218721  8.2 490.2  0.6
    r1-6x3   16384 30104  7.4 23427  2.7 21700  3.1 237144 74.8 221707  7.6 468.3  0.6
     
  7. mav@

    mav@ Member Developer

    Messages:
    615
    Likes Received:
    2
    Make sure that those drives not share one IDE port. Small numbers on mirror write are suspicious. In general, if no other limitations and the benchmark is multi-threaded, gstripe should give x2 performance on both read and write, while gmirror -- x2 on read and x1 on write.
     
  8. wblock@

    wblock@ Administrator Staff Member Administrator Moderator Developer

    Messages:
    11,252
    Likes Received:
    25
    They were on different ports on an old Promise PCI IDE controller. There may be bottlenecks on the card, but it was the only way other than IDE/USB converters to attach these to a recent motherboard. I figured the mirror write slowdown was due to the mirror having up to twice the rotational latency of a lone drive.
     
  9. mav@

    mav@ Member Developer

    Messages:
    615
    Likes Received:
    2
    You are right about latency, just not twice, but I think on average x1.5. But read-ahead/write-back of the file system should hide it.
     
  10. einthusan

    einthusan New Member

    Messages:
    87
    Likes Received:
    0
    l2arc writes more than reading

    After striping together three disks and adding an L2ARC device, I enabled caching of streaming data and let the L2ARC warm up for ten hours. The read rates from L2ARC are lower than those of disk reads. L2ARC keeps writing 40 MB/sec and reads only at 20 MB/sec. When I tested the SSD device using Bonnie++, the throughput was amazingly high. However, under real-world streaming load, it's as if the L2ARC wants to keep on caching disk reads instead of helping to improve overall read throughput. Any advice/tips would be much appreciated!
     
  11. t1066

    t1066 Member

    Messages:
    170
    Likes Received:
    2
    First make sure you have
    Code:
    vfs.zfs.l2arc_noprefetch=0
    set in /etc/sysctl.conf. Otherwise, set it on the command line

    # sysctl vfs.zfs.l2arc_noprefetch=0

    Next install sysutils/zfs-stats and run zstat. It will show the efficiencies of ARC, L2ARC and ZFETCH. Write them down for future references.

    Now comes the hard part. Determine what is the size of your working set. Then make sure it is less than the capacity of your L2ARC. This can be done in two ways. Add more cache drives. Or put the files into different filesystems and only set secondarycache=all for files that are you want to cache. Rerun zstat after each change and see what the improvement would be, if any.
     
  12. einthusan

    einthusan New Member

    Messages:
    87
    Likes Received:
    0
    Yes, I had this value set.

    Obviously the L2ARC wasn't working as I expected since it seems to be in degraded mode. I'll try to read up on this more. Thanks.

    Code:
    L2 ARC Summary: (DEGRADED)
    	Passed Headroom:			2.16m
    	Tried Lock Failures:			28.29k
    	IO In Progress:				178
    	Low Memory Aborts:			3
    	Free on Write:				117.27k
    	Writes While Full:			40.71k
    	R/W Clashes:				91
    	Bad Checksums:				64
    	IO Errors:				0
    	SPA Mismatch:				0
    
    L2 ARC Size: (Adaptive)				29.78	GiB
    	Header Size:			0.15%	45.87	MiB
    
    L2 ARC Evicts:
    	Lock Retries:				213
    	Upon Reading:				391
    
    L2 ARC Breakdown:				29.18m
    	Hit Ratio:			24.30%	7.09m
    	Miss Ratio:			75.70%	22.09m
    	Feeds:					111.90k
    
    L2 ARC Buffer:
    	Bytes Scanned:				45.67	TiB
    	Buffer Iterations:			111.90k
    	List Iterations:			6.63m
    	NULL List Iterations:			1.04m
    
    L2 ARC Writes:
    	Writes Sent:			100.00%	86.82k
    
    
    Code:
      pool: pool1
     state: ONLINE
      scan: scrub repaired 0 in 0h1m with 0 errors on Fri May 18 11:08:59 2012
    config:
    
    	NAME          STATE     READ WRITE CKSUM
    	pool1         ONLINE       0     0     0
    	  mirror-0    ONLINE       0     0     0
    	    ada0p2    ONLINE       0     0     0
    	    ada2p2    ONLINE       0     0     0
    	  gpt/disk1   ONLINE       0     0     0
    	cache
    	  gpt/cache1  ONLINE       0     0     0
    
    errors: No known data errors
    
     
  13. einthusan

    einthusan New Member

    Messages:
    87
    Likes Received:
    0
    I don't see L2ARC :S
    Code:
    ZFS real-time cache activity monitor
    
    Cache efficiency percentage:
                      10s    60s    tot
              ARC:  68.79  70.48  70.48
           ZFETCH:  96.88  96.94  96.94
    VDEV prefetch:   0.00   0.00   0.00
    
     
  14. einthusan

    einthusan New Member

    Messages:
    87
    Likes Received:
    0
    Got it to work. Made the changes you suggested. Does this look okay?
    Code:
    ZFS real-time cache activity monitor
    
    Cache efficiency percentage:
               10s    60s    tot
       ARC:  76.44  78.37  80.52
     L2ARC:  12.12  17.14  15.28
    ZFETCH:  98.56  98.61  98.77
    
     
  15. t1066

    t1066 Member

    Messages:
    170
    Likes Received:
    2
    Your L2ARC has 64 bad checksum, which is why it is classified as DEGRADED.

    The efficiency of your L2ARC is less than 20%, which is pretty bad unless it is just warming up. I would try to get the efficiency up to at least 70%. And ideally, it should be over 90% most of the time. You should try to improve the whole setup by monitoring the size of L2ARC and the efficiency. If you fill up the cache drive but still get low efficiency, you had to either add more cache drives or restrict caching to certain filesystems.