ZFS & RAID-Z vs RAID 10 -- Which is "better"?

A

Anonymous

Guest
I have a number of systems running a mix of Windows XP and FreeBSD. Until now, I have always used the default filesystem in each case and done backups by hand.

Now I need to create a NAS system with 2 purposes:
  1. to be a sandbox for experimenting with high-throughput, strongly fault-tolerant transactional database storage,
  2. as a LAN-shared, non-scratch storage system.

The best choices seem to be RAID 10, perhaps hardware-assisted, or ZFS RAID-Z/Z2.

Data integrity is most important, followed by staying within-budget, followed by high throughput.

Can someone with experience help me sort this out? The information I've found so far seems outdated, irrelevant to FreeBSD, too optimistic, or has insufficient detail.
 
Budget restraints means you're more or less stuck with RAID5 (RAIDZ) or RAID6 (RAIDZ2). RAID10 would probably cost too much. Depending on the storage size and the number of disks I'd probably go for a RAID6 (RAIDZ2). Your purposes aren't really compatible, you might want to split them up.

Keep in mind though, RAID is NOT a substitution for a proper backup. You or your users will screw up one day and the only course of action would be to restore files from backup.
 
RAIDZ is actually not exactly RAID5, it's similar but faster. RAID10 has more redundancy and should be faster yet, but in the smallest configuration needs four disks as opposed to RAIDZ's three. Since RAID10 is fully mirrored, it should be safer.
 
+1 for backups!

Also want to chip in about the world of difference between 2+2-RAID10 and 4-RAID6. If integrity is main priority, I choose RAID6 any day. With RAID10 you have to hope that the right two drives fall out at once, whereas a RAID6 can loose any two drives.

/Sebulon
 
If the budget allows, you may also consider using hard disks for raidz2 or raidz3 and add SSD as cache drives.
 
wblock@ said:
RAIDZ is actually not exactly RAID5, it's similar but faster. RAID10 has more redundancy and should be faster yet, but in the smallest configuration needs four disks as opposed to RAIDZ's three. Since RAID10 is fully mirrored, it should be safer.

And RAIDZ2 is safer than RAID10. With RAID10 you can survive two lost drives, but only if they belong to different mirrors.
 
I wrote a reply, but got logged out before I could post it. I'll try to summarise:

First: thanks for your responses.

I may have given the wrong impression about my budget constraints. I can probably dredge up US$2K for the sandbox system. I'd strongly rather not spend more than that because I'll need every penny for the production hardware (once I figure out what it should be!!)

ZFS seems very attractive from what I know of it. But what I know isn't much, its main attractions for me being that it neither wastes space nor wants identical drives. I need a walkthrough or a cookbook exposition on how to create RAID-Z2 installations, and preferably how to stripe and mirror them, if that's even possible.

Most of my experience is with SCSI, so when I read that the Adaptec 5405 had "4 ports" I envisioned 4 channels, each with an 8- or 16-connector cable hanging off it. When I realised it meant only 4 drives, and that I'd have to shift to the 5805 and buy lots of other hardware, that's when I started looking at ZFS. But the information I can find doesn't really give a good assessment of how bug-free freeBSD's implementation is, nor does it provide enough detail.
 
jalla said:
And RAIDZ2 is safer than RAID10. With RAID10 you can survive two lost drives, but only if they belong to different mirrors.

Not quite.

With RAID10, you can survive up to HALF of your drives being lost (assuming you are only doing 2x mirroring), if the right drives fail.

RAID6 is 2 drive failures only, irrespective of which ones.

RAID10 is also faster.

I'm not sure on the exact maths, but I suspect that with a sufficiently large RAID10, the probability of data loss becomes less than with RAID6, and really suspect the array doesn't need to be very big.

Rebuild time is also faster (straight drive copy for the single mirror), vs reading from all the remaining spindles of the RAID6 set, calculating parity and writing), so you're exposed for less time between failure 1 and subsequent failures. Also, performance impact whilst degraded is less.


edit:
For databases, Oracle have a strategy: "SAME" - "Stripe And Mirror Everything".

If you can afford to pay for the additional spindles to not do parity raid - do so.
 
throAU said:
Not quite.

With RAID10, you can survive up to HALF of your drives being lost (assuming you are only doing 2x mirroring), if the right drives fail.

RAID6 is 2 drive failures only, irrespective of which ones.

RAID10 is also faster.

I'm not sure on the exact maths, but I suspect that with a sufficiently large RAID10, the probability of data loss becomes less than with RAID6, and really suspect the array doesn't need to be very big.

Rebuild time is also faster (straight drive copy for the single mirror), vs reading from all the remaining spindles of the RAID6 set, calculating parity and writing), so you're exposed for less time between failure 1 and subsequent failures. Also, performance impact whilst degraded is less.


edit:
For databases, Oracle have a strategy: "SAME" - "Stripe And Mirror Everything".

If you can afford to pay for the additional spindles to not do parity raid - do so.

I think it is unfair comparing RAID10 and RAID6 in such a large array. When building a larger system, you will split the array(pool) into smaller raid groups(vdevs) anyway, so a more accurate comparison would be RAID10 vs. RAID60 in a system of that class.

For example:
2+2+2+2-RAID10(4x mirror vdevs)
vs.
4+4-RAID60(2x raidz2 vdevs)

I choose RAID60. The argument that a system with a sufficiently large RAID10 being less susceptible to data loss falls on the size of the drives. It was true when disk IO so-so matched the drive sizes, but as the sizes has now grown to up to 4TB, while still only being capable of generating 100-120MB/s IO it has become too risky, in my opinion. That´s why you absolutely need to be able to withstand more than one drive failure at a time, regardless of which drive. I´ve seen it happen, misfortune rarely comes alone. If disk IO capabilities had followed the same curve as their sizes, they should be able to shuffle around 400-500MB/s at 4TB size, and then I would have agreed with you. But then we would also need a significantly larger transport bus, say SAS3 with a bandwidth of like 1,2GB/s(just hypothesizing).

/Sebulon
 
Point taken.

It depends on the size of your array and your requirements I guess - 2x RAID6 is going to have pretty poor write throughput vs 4x RAID1. You could get even better resiliency with similar throughput (lower capacity though) by doing 2x3 drive mirrors (and withstand up to 4 drive failures).

I would steer well clear of 4tb drives personally (even 2tb drives), and go for 1tb or smaller, with increased number of them. If you do go for 2+tb drives, then yes RAID60 or 3 drive mirrors if you're paranoid...

I guess it all depends on your requirements for IOPs and storage capacity (conflicting requirements) - unless I'm mistaken, the size of the NAS was not mentioned in the OP... however high-throughput was mentioned for database use, and RAID6 sucks for that.


You're going to have to balance out the likelyhood of having multiple drive failures before rebuild completes vs. cost to restore from backup. I.e., is it worth giving up performance every day of operation, for lower probability of array failure and going to backup? I actually have a 16x1tb RAID50 array here that has been running without a single failure (at around 80-100% write throughput continusously for the past year, lol) for 4 years now (we've migrated off it to a RAID-DP Netapp). YMMV of course.
 
@throAU

I think you are missing key features with ZFS, namely ZIL and L2ARC. With ZFS, you are free to design your pool any way you want, disregarding IOPS and go only for 1)tolerance and 2)capacity, in that order. Performance is what ZIL and L2ARC are for.

/Sebulon
 
throAU said:
Not quite.
I'm not sure on the exact maths, but I suspect that with a sufficiently large RAID10, the probability of data loss becomes less than with RAID6, and really suspect the array doesn't need to be very big.

You're wrong. The probability of data loss is *allways* bigger with raid10 than with raid6,
 
jalla said:
You're wrong. The probability of data loss is *allways* bigger with raid10 than with raid6,

Let me moderate that and say this is valid for any reasonably sized raidgroup. If you increase the number of drives to 20-30-40 the chance of a triple diskfailure in raid6 may become bigger than the probability of a catastrophic failure of a raid10. That's also augmented by the fact that time to reconstruct a failed disk will grow immensly in this scenario.
 
Comparing 6 disks in one raidz2 vdev with 3 mirrored vdevs + 1 hot spare (7 disks), the chance of catastrophic failure is about the same; There was a big discussion of why this is so but I can't find the link. I'm mentioning this since I have this configuration at home.

I think the main reason why this was so is the resilvering time it takes for mirror configuration is superior to raidz2 (abut 4 hours on my 2TB disk). The resilvering also taxes whole raidz2 vdev which increases the chance of another old disks failure.

Even for home scenario I would use mirrors with hot spares.
 
Maybe I'm looking at this too simplistic but assume A0, A1, B0, B1, C0 and C1 are the drives. The 0 and 1 numbers are each other's mirror. Suppose A0 breaks. If any other drive breaks there's a 20% percent chance of catastrophic failure as only one drive (A1) in five ends up bad. With RAID6 there's 0% chance because it can handle any, random, two broken drives.

If you add more drives the chance of catastrophic failure on the mirrored sets is reduced (with 8 drives it's a little over 14%) but it will never reach 0.
 
But 3 mirror vdevs can handle 3 disks failures (and increasing with number of vdevs) whereas raidz2 will always be capped at 2.
 
bbzz said:
But 3 mirror vdevs can handle 3 disks failures (and increasing with number of vdevs) whereas raidz2 will always be capped at 2.

True, but the chance of catastrophic failure on your mirror quickly goes up the more drives break. There's a much bigger chance you're already screwed when two go. You have to be really lucky if the correct drives break.

Suppose it's A0 and B0. Then there's a 50% chance the third one dying will be catastrophic.
 
@bbzz
There are real-life situations where you'll be better off with 3 pairs of mirrors than with a 6-disk raidz2, but that doesn't change the statistics (just like there are people who win the lottery, though statistically they shouldn't win in a thousand lifetimes).

What you're saying is basically that you "might get lucky", but if safety is your main concern that's not the optimal choice.

The only meaningful way of looking at the probabilty of failure is to work out the average on a large scale, and for reasonably sized vdevs that comes out favourably for raidz2 vs raid10.

OTOH maybe we should start with the question "how much safety is enough?". Given the 6-disk case, the chance of dataloss is extremely small either way so the problem is really if you want to optimize for speed (of reads) or capacity :)
 
I understand that. But I'm also sure that test didn't conclude on just getting "lucky" in few specific instances and that several things were factored in.

Whatever the case, if OP is serious about data, he will need a backup anyway.
 
Back
Top