Raidz1 vs RaidZ3

Hi there

Actually I'm searching a bottleneck on our Server there ships Samba, Netatalk and Bacula they has to handle hundred thousands of little files.

The Box has got two XEON's and 92 GB memory. FreeBSD 9.1-RC1 is installed on two SSD's mirrored with gmirror and it has 16x 3000 GB Drives to use with ZFS to store the data of my company

Currently the drives are configured as an RaidZ3 with 15 drives and 1 as spare. Would it be better for the performance to configure the drives as an RaidZ1 with 16 drives and no spare?

optionally I could add 2 - 10 GB for an ZIL on the SSD's

cheers Darko
 
DO NOT use vdevs with more than 9 drives. Performance will tank. Especially when using raidz-anything.

  • If you want the fastest storage, use multiple mirror vdevs. 8x 2-disk mirrors, for example.
  • If you want fast-ish storage, use multiple raidz1 vdevs. 4x 4-disk raidz1, for example.
  • If you want fast-ish storage with more redundancy, use multiple raidz2 vdevs. 3x 5-disk raidz2 + spare, for example.
  • If you want slow storage, but super redundancy, use multiple raidz3 vdevs. 2x 8-disk raidz3, for example.

Using a single raidz3 vdev across 15 drives is just ludacris. Write performance of raidz vdevs is limited to the speed of the slowest drive. And, if you ever lose a drive, you will probably never get a new one resilvered.

Rebuild the pool.
 
minimike said:
Currently the drives are configured as an RaidZ3 with 15 drives and 1 as sparse. Would it be better for the performance to configure the drives as an RaidZ1 with 16 drives and no Sparse?
Using multiple raidz's in the pool will improve performance. I configured my 16 x 2TB drives as 3 5-drive raidz1's plus a spare:
Code:
(0:2) hostname:/sysprog/terry# zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0 in 6h44m with 0 errors on Mon Nov 21 11:36:19 2011
config:

        NAME             STATE     READ WRITE CKSUM
        data             ONLINE       0     0     0
          raidz1-0       ONLINE       0     0     0
            label/twd0   ONLINE       0     0     0
            label/twd1   ONLINE       0     0     0
            label/twd2   ONLINE       0     0     0
            label/twd3   ONLINE       0     0     0
            label/twd4   ONLINE       0     0     0
          raidz1-1       ONLINE       0     0     0
            label/twd5   ONLINE       0     0     0
            label/twd6   ONLINE       0     0     0
            label/twd7   ONLINE       0     0     0
            label/twd8   ONLINE       0     0     0
            label/twd9   ONLINE       0     0     0
          raidz1-2       ONLINE       0     0     0
            label/twd10  ONLINE       0     0     0
            label/twd11  ONLINE       0     0     0
            label/twd12  ONLINE       0     0     0
            label/twd13  ONLINE       0     0     0
            label/twd14  ONLINE       0     0     0
        logs
          da0            ONLINE       0     0     0
        spares
          label/twd15    AVAIL   

errors: No known data errors
That was before I found out that the zfs "autoreplace" attribute doesn't actually automatically replace a failed drive with a spare. If I was doing this again, I would use 4 4-drive raidz1's for the same usabable capacity and ~ 25% speed increase. Whether that makes sense for you depends on how soon you can get someone to the site to swap a failed drive (with the spare in the pool you can do it remotely from the command line; with no spare you have to physically go to the site and swap the drive). The pool I show above is replicated (not using ZFS tools, but on another system with an identical ZFS pool) and is backed up to tape regularly.

optionally I could add 2 - 10 GB for an ZIL on the SSD's
The ZIL shown as da0 in the pool I show above is a PCIe SSD. Whether something like that will improve your performance or not depends on your write-to-read ratio. If you do this, make sure your partition(s) are aligned properly.

You might also want to investigate the various tunables that affect ZFS performance. The sysutils/zfs-stats port might be useful. I added the following as /usr/local/etc/periodic/daily/410.zfs-stats so that I'd get a comprehensive statistics report every night as part of the daily run output:
Code:
#!/bin/sh
#

# If there is a global system configuration file, suck it in.
#
if [ -r /etc/defaults/periodic.conf ]
then
    . /etc/defaults/periodic.conf
    source_periodic_confs
fi

case "$daily_status_zfs_stats_enable" in
    [Yy][Ee][Ss])
        echo
        echo 'ZFS performance data:'
        echo
        /sbin/zpool status
        echo
        /usr/local/bin/zfs-stats -MEADLZ

        rc=0
        ;;

    *)  rc=0;;
esac

exit $rc
 
Terry_Kennedy said:
Code:
        /usr/local/bin/zfs-stats -MEADLZ

Trying this today, it seems there is no longer a -E option. But this works:

Code:
/usr/local/bin/zfs-stats -MADLZ
 
Terry_Kennedy said:
That was before I found out that the zfs "autoreplace" attribute doesn't actually automatically replace a failed drive with a spare. If I was doing this again, I would use 4 4-drive raidz1's for the same usabable capacity and ~ 25% speed increase.

Don't do it. raidz 4x4 is what I have, it's slow (what you will gain is more space and a massive drop in performance). When I say slow I mean two days to scrub and one week to resilver with 2x SSD as cache and a 4 GB DDR I-RAM drive as log device with 32 GIG DDR 3 RAM. Your 3 x 5 is a better idea, one which I never thought of.

I am currently leaning towards 2 x raidz3 (2 x 8 disks) or 1 x raidz3, 16 disks. More space as 1 x less drive being used for redundacy over my current raidz 4 x 4 setup. Mostly because the Solaris site said 5 + 2 was a good combination but unable to find anything to back that up as the math below points out:


3-disk RAID-Z = 128KiB / 2 = 64KiB = good
4-disk RAID-Z = 128KiB / 3 = ~43KiB = BAD!
5-disk RAID-Z = 128KiB / 4 = 32KiB = good
9-disk RAID-Z = 128KiB / 8 = 16KiB = good


4-disk RAID-Z2 = 128 / 2 = 64KiB
6-disk RAID-Z2 = 128 / 4 = 32KiB
8-disk RAID-Z2 = 128 / 6 = 21.3kiB = bad!
10-disk RAID-Z2 = 128 / 8 = 16KiB


4-disk RAID-Z3 = 128 / 1 = 128
5-disk RAID-Z3 = 128 / 2 = 64
7-disk RAID-Z3 = 128 / 4 = 32
8-disk RAID-Z3 = 128 / 5 = 25.6
11-disk RAID-Z3 = 128 / 8 = 16
16-disk RAID-Z3 = 128 / 13 = 9.8 = bad!
 
MasterCATZ said:
Don't do it. raidz 4x4 is what I have, it's slow (what you will gain is more space and a massive drop in performance). When I say slow I mean two days to scrub and one week to resilver with 2x SSD as cache and a 4 GB DDR I-RAM drive as log device with 32 GIG DDR 3 RAM. Your 3 x 5 is a better idea, one which I never thought of.
Actually, 3 x 5 RAIDZ1 has 12 usable disks + 3 parity + 1 spare. While 4 x 4 RAIDZ1 also has 12 usable disks + 4 parity and no spare.

Performance seems to be similar in both configurations, in the 600-700MB/sec range. Scrubs and resilvers also complete in about the same time on both configurations (6-7 hours for scrub of 32TB raw, 21TB usable, 13TB used.

It doesn't provide as much of a performance gain as I had hoped, so I'll probably stick to 3 x 5 and hope that devd eventually learns how to trigger ZFS replacement.
 
minimike said:
Currently the drives are configured as an RaidZ3 with 15 drives and 1 as spare. Would it be better for the performance to configure the drives as an RaidZ1 with 16 drives and no spare?

What phoenix said.

I would also strongly recommend reading the ZFS tuning pages, the ZFS FAQ and understand the difference between pools, VDEVs and the varying RAID levels.

You want to keep your VDEVs smaller, and stripe across multiple VDEVs for performance.

The fact that your pool has ended up created in the configuration it is currently in would tend to indicate that whoever did it has not read any of the ZFS recommendations and doesn't understand the implications of what they are doing.

If you want the best possible performance (whilst having some spares), you could go to a pool with 7x2 drive mirror VDEVs and 2 spares (which will give you roughly 7x your current sustained write throughput), with the alternative of a smaller number of RAIDZ1 or RAIDZ2 VDEVs being a compromise for more space (and in the case of multiple RAIDZ2s, better reliability).

Also, I wouldn't bother putting the OS on SSD.

Maybe take two spinning drives out of your pool and install on them as a mirror (or even install root on ZFS instead), then use your pair of SSDs as a mirrored ZIL or L2ARC, depending on whether your workload is more write heavy or read heavy.
 
Back
Top