Zanthra said:
I keep reading that using RAID-Z3 has performance impacts vs RAID-Z2 or RAID-Z, but I cannot figure out why.
RAID-Z3, or any RAID that writes extra redundancy copies to deal with extra failures, has a performance impact, simply because a larger fraction of the drives are needed for redundancy.
Let's work this through in a simplified example. You say you have 8 drives, and want to run a 3-fault tolerant RAID code on them. My (hypothetical) RAID implementation will take each block of data you write, cut it into 5 slices, and write 8 slices simultaneously. For large reads, it will simultaneously read 5 disks (the other 3 are idle) to reassemble the block, and return it for you. For small reads, it reads from any of the 5 disks that happen to contain the data (not parity) of the particular slice you are interested in.
For further simplification, let's start by studying sequential writing workloads, and let's assume that the RAID implementation keeps sequential files also sequential on disk. Say every disk is capable of reading or writing 100 MB/s, and let's also assume that the disks are the only bottleneck (this is either reality, or at least a design goal). In this case, a RAID-Z3 implementation would be able to write 500 MB/s of user data, while actually pushing 800 MB/s (the hardware limit) onto the disks. You can easily see that a non-redundant implementation would be able to write 800 MB/s, a 1-fault tolerant RAID code would be able to write 700 MB/s, and a 2-fault tolerant code 600 MB/s. Right there is your performance impact! You paid for 8 disks, and you only got 5 disks worth of write bandwidth.
Now, for a read workload it gets more tricky, and one has to be very careful in how one defines performance. Let's first look at your workload being a single-threaded sequential read. In that case, at any given moment only 5 disks are busy, since we don't have to read the parity disks, so you will only get 500 MB/s again (just like the write case). Again, the same 3/8 performance penalty. But then, you could say that you have lots of applications running, all simultaneously doing sequential reads, and you could argue that if you average over hundreds of read streams, all disks are busy. That argument is actually not completely correct: Because 3/8 of all data on disk is parity that never needs to be read, the workload on disk will not be completely sequential (the disks need to skip over parity slices on disk), so the performance will be slightly lower; but perhaps, that correction is a small effect. But if your disk array is actually only performance limited, not capacity limited, your argument is pretty good. On the other hand, if you bought this many disks because you needed the capacity, then with a RAID-Z3 code you had to buy 3 extra disks, again a 3/8 cost overhead.
The situation with small and non-sequential reads and writes is even more complicated, and depends on RAID implementation and workload. But overall, the higher the fault tolerance, the higher the capacity impact and write performance impact is going to be. On the other hand, if you are not capacity limited and your workload consists of reads (meaning you are only worried about read performance), then RAID typically has little or no performance impact.
For the average home server, or the typical small commercial system, this typically doesn't matter anyhow. It's quite unusual to find workloads that can actually saturate the performance of many disk drives for extended periods. And if the occasional file copy takes 3/8 longer, that probably has few real-world effects on users.
By the way, please don't interpret what I wrote to mean that you should use less redundancy, just because doing a highly redundant RAID (like ZFS's RAID-Z3) will have a cost and performance impact. For most users, the data is more valuable than the hardware it is stored on, and I very much endorse RAID in general, and in particular RAID codes that can handle multiple faults. In the style of the credit card commercial: Getting your data back even after multiple disk failures - priceless.