ZFS ZFS: building "RAID1" with a "RAID0" that has previous data

Zamana · Apr 4, 2018

Hi!

I'm sorry if I don't use the proper ZFS jargon...

I would like to build a "RAID0 (concat)" first, put some data in it, and then build a second "RAID0 (concat)", and then create a "RAID1 (mirror)" in order to have the data on the first RAID0 mirrored to the second one, all of this using ZFS.

My question: is it possible to do this without losing the data that is in the first "RAID0"?

Thanks.
Regards.

natharan · Apr 4, 2018

Hello Zamana,

I'm afraid it's not possible, a ZFS pool can only consist of vdevs (disks, partitions), not of other pools. So you can create a pool of mirrors, raidz, logs, each of them consisting of one or more vdevs and combine them as you like in a single pool, i.e.:

Code:

zpool create <your_pool>      \
  mirror <dev0> <dev1>        \
  raidz  <dev2> <dev3> <dev4> \
  log    <dev5>               \
  cache  <dev6>

or

Code:

zpool create <your_pool>   \
  <dev0>                   \
  log mirror <dev1> <dev2> \
  cache <dev3>

but you can't create a pool of mirrors of pools or a pool of mirrors of pools of raidz.

Bobi B. · Apr 5, 2018

And your goal is what: to get both -- performance of RAID0 and fault-tolerance of RAID1?

Would you consider building a pool out of two mirror vdevs (perhaps you're scarce on disk space and need to migrate the data?) Something like zpool create ... tank mirror da0 da1, then do zpool add tank mirror da2 da3.

Edit: You can create same configuration incrementally, as well: zpool create -O mountpoint=none tank /dev/da0 will create a pool named tank sized same as 1st drive (da0). Then expand the pool by adding another drive (da2): zpool add tank /dev/da2. Now you have RAID0 equivalent -- size is sum of sizes of both drives with no redundancy:

Code:

    NAME        STATE     READ WRITE CKSUM
    tank        ONLINE       0     0     0
      da0       ONLINE       0     0     0
      da2       ONLINE       0     0     0

Copy your data over, remember there is no redundancy at this time!

Now you can add redundancy: zpool attach tank da0 /dev/da1 and zpool attach tank da2 /dev/da3. Here you have RAID01 equivalent:

Code:

    NAME        STATE     READ WRITE CKSUM
    tank        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        da0     ONLINE       0     0     0
        da1     ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        da2     ONLINE       0     0     0
        da3     ONLINE       0     0     0

Just to say, that you might be better off using (aligned) partitions, possibly sized a bit less than drives, instead of RAW devices.

Zamana · Apr 5, 2018

Hello!

Thank you natharan and Bobi B. for your replies.

Bobi B. , I guess you get my idea.

Currently I have 4x4TB in a RAID5 (raidz1, actually), filled with 11TB of data:

Code:

zamana@NAS:~$ sudo zpool status
  pool: zfs0
state: ONLINE
  scan: scrub repaired 0 in 17h29m with 0 errors on Sun Mar 11 17:53:28 2018
config:

        NAME                                 STATE     READ WRITE CKSUM
        zfs0                                 ONLINE       0     0     0
          raidz1-0                           ONLINE       0     0     0
            ata-ST4000DM000-1F2168_Z301AKN2  ONLINE       0     0     0
            ata-ST4000DM000-1F2168_Z301AKZT  ONLINE       0     0     0
            ata-ST4000DM000-1F2168_Z301AQZR  ONLINE       0     0     0
            ata-ST4000DM000-1F2168_Z301CG8V  ONLINE       0     0     0

errors: No known data errors

I'm about to get more 4x4TB disks, and I want to arrange this 8x4TB as a mirror, 4 disks each side.

(perhaps you're scarce on disk space and need to migrate the data?)

Exactly!

The question is that I don't have sufficient storage to backup my current 11TB of data...

So I was wondering if it will be possible to build the first RAID0 (4x4TB), copy my current 11TB of data to it, and then build the second RAID0 (same size), and then arrange both as a RAID1 (4x4TB + 4x4TB), without losing the data.

And your goal is what: to get both -- performance of RAID0 and fault-tolerance of RAID1?

My main concern is fault-tolerance. Performance would be a bonus, but not required.

As I can understand from your reply, it is possible to implement my idea, incrementally, without losing the data, by following your advice above, right?

If not, do you recommend any other strategy?

Thanks.
Regards.

Bobi B. · Apr 5, 2018

Yes. I believe you can migrate your data this way, by creating a ZFS pool on top of 4 separate vdevs, each using one of your new drives, giving you about 14.5 TB of usable disk space: do zpool create [U]pool-name[/U] [U]new-disk1[/U], then

 zpool add [U]pool-name[/U] [U]new-disk2[/U]; zpool add [U]pool-name[/U] [U]new-disk3[/U]; zpool add [U]pool-name[/U] [U]new-disk4[/U]

.

Now you copy your data to the new pool and do a scrub to be sure everything is okay.

At this moment you have two copies of your data -- old one, on RAIDZ1, and new one on ZFS "stripe". So far so good.

The risk is, that you'll have to destroy your RAIDZ1 pool in order to move old hard drives to the new pool: failure of new drives will lead to data loss. That's the point of the scrub, to have some confidence, that the cloned data is readable (and hopefully will remain okay in the next few days).

Now you add redundancy to the new pool: first you destroy the existing RAIDZ1 pool, then you upgrade disk vdevs to mirror vdevs:

 zpool attach [U]pool-name[/U] [U]new-disk1[/U] [U]old-disk1[/U]; zpool attach [U]pool-name[/U] [U]new-disk2[/U] [U]old-disk2[/U]; ...

, you got the idea.

Do consider the following:

be sure to compare physical disk sizes (old/new) with geom disk list;
decide whether to use RAW devices or aligned partitions, sized few tens of megabytes less than disk size in case you need to replace physical disks with newer ones, having similar, but not exactly the same size;
be aware of the risk involved: new disk failure after RAIDZ1 is destroyed will lead to data loss, as might running incorrect command

I would strongly suggest you to play with ZFS a bit, perhaps on a VM or using image files and mdconfig(8) to test everything in advance, in a smaller scale. 11 TB is a lot of data!

Well, good luck!

SirDice · Apr 5, 2018

Zamana said:
My main concern is fault-tolerance.

In which case 0+1 (a mirror of a striped set) would be a bad idea from the start. Remember there's no fault-tolerance on a RAID0. So if one disk fails in the RAID0 set, that whole set fails. This wouldn't be a problem because of the other, mirrored, RAID0 set but you won't have any fault-tolerance left. If another disk on that set dies everything will be gone. This is why you typically see 1+0 (a striped set of mirrors) instead. There's still a risk if the "right" two disks die but this risk is a lot less compared to 0+1 where any disk of sideA and any disk of sideB will lead to a complete failure.

Bobi B. · Apr 5, 2018

I believe the end result would be a pool on top of 4 mirrored vdevs giving 100% 1 disk failure redundancy, possibly even better, as long as at least one disk per vdev remains alive. Or am I wrong?

SirDice · Apr 5, 2018

Bobi B. said:
I believe the end result would be a pool on top of 4 mirrored vdevs giving 100% 1 disk failure redundancy, possibly even better, as long as at least one disk per vdev remains alive. Or am I wrong?

The configuration you showed in post #3 is a striped set of mirrors (1+0) aka RAID10. RAID01 or 0+1 would be bad.

https://www.thegeekstuff.com/2011/10/raid10-vs-raid01/

Zamana · Apr 5, 2018

In which case 0+1 (a mirror of a striped set) would be a bad idea from the start.

Yes. And I confess that my first approach was to consider using these 8x4TB in a RAID6 (raidz2). This way I could lose any 2 disks and still keep my data. But then I decided that would be better to have my data in two places instead of just one...

If I follow the steps in the post #3 I'll be building the 1+0 arrange that you consider better?

Any other suggestion beyond this one?

ralphbsz · Apr 6, 2018

RAID 0+1 is indeed bad, as several people have already said: It can tolerate only a single fault, and a second dead disk is pretty much guaranteed to kill it. And with 8 disks running, you are exposed to a lot more disk failures.

RAID 1+0 is reasonable. It can definitely tolerate failure of any one disk (or a single error on one disk). It can sometimes tolerate two faults, if they are not in the same pair (or two sector errors that are not overlapping). But it is still inefficient: you get only 50% of the capacity.

With this many disks, you are much better off using a double-fault tolerant RAID, like RAID-Z2. Then you can guaranteed lose any two disks (which includes the special case of two errors on separate disks, or one dead disk plus one sector error on another disk). In effect, with RAID-Z2 you do have all your data in three places at once. You get two-fault tolerance, and you get 75% space efficiency.

The thing to remember is this: disks have gotten much bigger (your 4TB disks are unimaginably large compared to the disks that were in use when RAID was invented about 40 years ago). But disks have not gotten more reliable; the uncorrectable error rate is still specified by disk manufacturers as roughly 10^-15 per bit (plus or minus one in the exponent). That means that with today's gigantic disks (your disk has 32 trillion bits), the probability of getting a second error (most likely a single sector error) while trying to recover from a dead disk is pretty high. For this reason, the former CTO of NetApp (you know what company NetApp is, the people who turned storage into an appliance about 20 years ago) has been going around saying that "selling single-fault tolerant RAID is professional malpractice". For disks this size, and for this many disks, please make your RAID arrays at least 2-fault tolerant.

Now, how to migrate from your current layout to an 8-disk RAID-Z2: That's tough. Personally, I would either hope for a good network connection, and temporarily copy the data to the cloud and then get it back, or perhaps buy/borrow/steal a pair of 12TB drives for a temporary location. Or see whether you can get a decent tape drive.

Zamana · Apr 7, 2018

Hello Sirs.

Thank you very much for all your replies. I'll think about all the options you gave me.

Thanks!
Regards.

ZFS ZFS: building "RAID1" with a "RAID0" that has previous data

Zamana

natharan

Bobi B.

Zamana

Bobi B.

SirDice

Administrator

Bobi B.

SirDice

Administrator

Zamana

ralphbsz

Zamana