ZFS & RAID-Z vs RAID 10 -- Which is "better"?

Anonymous · Oct 9, 2012

I have a number of systems running a mix of Windows XP and FreeBSD. Until now, I have always used the default filesystem in each case and done backups by hand.

Now I need to create a NAS system with 2 purposes:

to be a sandbox for experimenting with high-throughput, strongly fault-tolerant transactional database storage,
as a LAN-shared, non-scratch storage system.

The best choices seem to be RAID 10, perhaps hardware-assisted, or ZFS RAID-Z/Z2.

Data integrity is most important, followed by staying within-budget, followed by high throughput.

Can someone with experience help me sort this out? The information I've found so far seems outdated, irrelevant to FreeBSD, too optimistic, or has insufficient detail.

SirDice · Oct 9, 2012

Budget restraints means you're more or less stuck with RAID5 (RAIDZ) or RAID6 (RAIDZ2). RAID10 would probably cost too much. Depending on the storage size and the number of disks I'd probably go for a RAID6 (RAIDZ2). Your purposes aren't really compatible, you might want to split them up.

Keep in mind though, RAID is NOT a substitution for a proper backup. You or your users will screw up one day and the only course of action would be to restore files from backup.

wblock@ · Oct 9, 2012

RAIDZ is actually not exactly RAID5, it's similar but faster. RAID10 has more redundancy and should be faster yet, but in the smallest configuration needs four disks as opposed to RAIDZ's three. Since RAID10 is fully mirrored, it should be safer.

Sebulon · Oct 9, 2012

+1 for backups!

Also want to chip in about the world of difference between 2+2-RAID10 and 4-RAID6. If integrity is main priority, I choose RAID6 any day. With RAID10 you have to hope that the right two drives fall out at once, whereas a RAID6 can loose any two drives.

/Sebulon

t1066 · Oct 9, 2012

If the budget allows, you may also consider using hard disks for raidz2 or raidz3 and add SSD as cache drives.

jalla · Oct 9, 2012

wblock@ said:
RAIDZ is actually not exactly RAID5, it's similar but faster. RAID10 has more redundancy and should be faster yet, but in the smallest configuration needs four disks as opposed to RAIDZ's three. Since RAID10 is fully mirrored, it should be safer.

And RAIDZ2 is safer than RAID10. With RAID10 you can survive two lost drives, but only if they belong to different mirrors.

Anonymous · Oct 9, 2012

I wrote a reply, but got logged out before I could post it. I'll try to summarise:

First: thanks for your responses.

I may have given the wrong impression about my budget constraints. I can probably dredge up US$2K for the sandbox system. I'd strongly rather not spend more than that because I'll need every penny for the production hardware (once I figure out what it should be!!)

ZFS seems very attractive from what I know of it. But what I know isn't much, its main attractions for me being that it neither wastes space nor wants identical drives. I need a walkthrough or a cookbook exposition on how to create RAID-Z2 installations, and preferably how to stripe and mirror them, if that's even possible.

Most of my experience is with SCSI, so when I read that the Adaptec 5405 had "4 ports" I envisioned 4 channels, each with an 8- or 16-connector cable hanging off it. When I realised it meant only 4 drives, and that I'd have to shift to the 5805 and buy lots of other hardware, that's when I started looking at ZFS. But the information I can find doesn't really give a good assessment of how bug-free freeBSD's implementation is, nor does it provide enough detail.

wblock@ · Oct 10, 2012

It would be easier to estimate what is needed if you would estimate how much space is needed.

throAU · Oct 10, 2012

jalla said:
And RAIDZ2 is safer than RAID10. With RAID10 you can survive two lost drives, but only if they belong to different mirrors.

Not quite.

With RAID10, you can survive up to HALF of your drives being lost (assuming you are only doing 2x mirroring), if the right drives fail.

RAID6 is 2 drive failures only, irrespective of which ones.

RAID10 is also faster.

I'm not sure on the exact maths, but I suspect that with a sufficiently large RAID10, the probability of data loss becomes less than with RAID6, and really suspect the array doesn't need to be very big.

Rebuild time is also faster (straight drive copy for the single mirror), vs reading from all the remaining spindles of the RAID6 set, calculating parity and writing), so you're exposed for less time between failure 1 and subsequent failures. Also, performance impact whilst degraded is less.

edit:
For databases, Oracle have a strategy: "SAME" - "Stripe And Mirror Everything".

If you can afford to pay for the additional spindles to not do parity raid - do so.

Sebulon · Oct 10, 2012

throAU said:
Not quite.

With RAID10, you can survive up to HALF of your drives being lost (assuming you are only doing 2x mirroring), if the right drives fail.

RAID6 is 2 drive failures only, irrespective of which ones.

RAID10 is also faster.

I'm not sure on the exact maths, but I suspect that with a sufficiently large RAID10, the probability of data loss becomes less than with RAID6, and really suspect the array doesn't need to be very big.

Rebuild time is also faster (straight drive copy for the single mirror), vs reading from all the remaining spindles of the RAID6 set, calculating parity and writing), so you're exposed for less time between failure 1 and subsequent failures. Also, performance impact whilst degraded is less.

edit:
For databases, Oracle have a strategy: "SAME" - "Stripe And Mirror Everything".

If you can afford to pay for the additional spindles to not do parity raid - do so.

I think it is unfair comparing RAID10 and RAID6 in such a large array. When building a larger system, you will split the array(pool) into smaller raid groups(vdevs) anyway, so a more accurate comparison would be RAID10 vs. RAID60 in a system of that class.

For example:
2+2+2+2-RAID10(4x mirror vdevs)
vs.
4+4-RAID60(2x raidz2 vdevs)

I choose RAID60. The argument that a system with a sufficiently large RAID10 being less susceptible to data loss falls on the size of the drives. It was true when disk IO so-so matched the drive sizes, but as the sizes has now grown to up to 4TB, while still only being capable of generating 100-120MB/s IO it has become too risky, in my opinion. ThatÂ´s why you absolutely need to be able to withstand more than one drive failure at a time, regardless of which drive. IÂ´ve seen it happen, misfortune rarely comes alone. If disk IO capabilities had followed the same curve as their sizes, they should be able to shuffle around 400-500MB/s at 4TB size, and then I would have agreed with you. But then we would also need a significantly larger transport bus, say SAS3 with a bandwidth of like 1,2GB/s(just hypothesizing).

/Sebulon

throAU · Oct 16, 2012

Point taken.

It depends on the size of your array and your requirements I guess - 2x RAID6 is going to have pretty poor write throughput vs 4x RAID1. You could get even better resiliency with similar throughput (lower capacity though) by doing 2x3 drive mirrors (and withstand up to 4 drive failures).

I would steer well clear of 4tb drives personally (even 2tb drives), and go for 1tb or smaller, with increased number of them. If you do go for 2+tb drives, then yes RAID60 or 3 drive mirrors if you're paranoid...

I guess it all depends on your requirements for IOPs and storage capacity (conflicting requirements) - unless I'm mistaken, the size of the NAS was not mentioned in the OP... however high-throughput was mentioned for database use, and RAID6 sucks for that.

You're going to have to balance out the likelyhood of having multiple drive failures before rebuild completes vs. cost to restore from backup. I.e., is it worth giving up performance every day of operation, for lower probability of array failure and going to backup? I actually have a 16x1tb RAID50 array here that has been running without a single failure (at around 80-100% write throughput continusously for the past year, lol) for 4 years now (we've migrated off it to a RAID-DP Netapp). YMMV of course.

Sebulon · Oct 16, 2012

@throAU

I think you are missing key features with ZFS, namely ZIL and L2ARC. With ZFS, you are free to design your pool any way you want, disregarding IOPS and go only for 1)tolerance and 2)capacity, in that order. Performance is what ZIL and L2ARC are for.

/Sebulon

jalla · Oct 16, 2012

throAU said:
Not quite.
I'm not sure on the exact maths, but I suspect that with a sufficiently large RAID10, the probability of data loss becomes less than with RAID6, and really suspect the array doesn't need to be very big.

You're wrong. The probability of data loss is *allways* bigger with raid10 than with raid6,

Toast · Oct 16, 2012

jalla said:
You're wrong. The probability of data loss is *allways* bigger with raid10 than with raid6,

http://www.servethehome.com/wp-content/uploads/2010/08/Simple-MTTDL-Combined.png

jalla · Oct 16, 2012

Toast said:
http://www.servethehome.com/wp-content/uploads/2010/08/Simple-MTTDL-Combined.png

There's no explanation of the model behind this, but you'd have to make some rather weird assumptions to come up with a graph like that. It's interresting that the same site has a Raid Reliabilty Calculator that shows a very different picture.

Also, here is a better substantiated model of MTTDL of various raid-levels.

jalla · Oct 16, 2012

jalla said:
You're wrong. The probability of data loss is *allways* bigger with raid10 than with raid6,

Let me moderate that and say this is valid for any reasonably sized raidgroup. If you increase the number of drives to 20-30-40 the chance of a triple diskfailure in raid6 may become bigger than the probability of a catastrophic failure of a raid10. That's also augmented by the fact that time to reconstruct a failed disk will grow immensly in this scenario.

bbzz · Oct 16, 2012

Comparing 6 disks in one raidz2 vdev with 3 mirrored vdevs + 1 hot spare (7 disks), the chance of catastrophic failure is about the same; There was a big discussion of why this is so but I can't find the link. I'm mentioning this since I have this configuration at home.

I think the main reason why this was so is the resilvering time it takes for mirror configuration is superior to raidz2 (abut 4 hours on my 2TB disk). The resilvering also taxes whole raidz2 vdev which increases the chance of another old disks failure.

Even for home scenario I would use mirrors with hot spares.

SirDice · Oct 16, 2012

Maybe I'm looking at this too simplistic but assume A0, A1, B0, B1, C0 and C1 are the drives. The 0 and 1 numbers are each other's mirror. Suppose A0 breaks. If any other drive breaks there's a 20% percent chance of catastrophic failure as only one drive (A1) in five ends up bad. With RAID6 there's 0% chance because it can handle any, random, two broken drives.

If you add more drives the chance of catastrophic failure on the mirrored sets is reduced (with 8 drives it's a little over 14%) but it will never reach 0.

bbzz · Oct 16, 2012

But 3 mirror vdevs can handle 3 disks failures (and increasing with number of vdevs) whereas raidz2 will always be capped at 2.

SirDice · Oct 16, 2012

bbzz said:
But 3 mirror vdevs can handle 3 disks failures (and increasing with number of vdevs) whereas raidz2 will always be capped at 2.

True, but the chance of catastrophic failure on your mirror quickly goes up the more drives break. There's a much bigger chance you're already screwed when two go. You have to be really lucky if the correct drives break.

Suppose it's A0 and B0. Then there's a 50% chance the third one dying will be catastrophic.

jalla · Oct 16, 2012

@bbzz
There are real-life situations where you'll be better off with 3 pairs of mirrors than with a 6-disk raidz2, but that doesn't change the statistics (just like there are people who win the lottery, though statistically they shouldn't win in a thousand lifetimes).

What you're saying is basically that you "might get lucky", but if safety is your main concern that's not the optimal choice.

The only meaningful way of looking at the probabilty of failure is to work out the average on a large scale, and for reasonably sized vdevs that comes out favourably for raidz2 vs raid10.

OTOH maybe we should start with the question "how much safety is enough?". Given the 6-disk case, the chance of dataloss is extremely small either way so the problem is really if you want to optimize for speed (of reads) or capacity

bbzz · Oct 16, 2012

I understand that. But I'm also sure that test didn't conclude on just getting "lucky" in few specific instances and that several things were factored in.

Whatever the case, if OP is serious about data, he will need a backup anyway.

belon_cfy · Oct 25, 2012

Raid 5 is risky, I assume the raidz also similar but slightly better than raid5.
http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

Will choose raidz2 for reliability.

ZFS & RAID-Z vs RAID 10 -- Which is "better"?

Anonymous

Guest

SirDice

Administrator

wblock@

Sebulon

t1066

jalla

Anonymous

Guest

wblock@

throAU

Sebulon

throAU

Sebulon

jalla

Toast

jalla

jalla

bbzz

SirDice

Administrator

bbzz

SirDice

Administrator

jalla

bbzz

belon_cfy