Storage-Rebuild: ZFS Layout

Hello,

I want to switch from my Synology-NAS + Server to an All-in-One-Box and for this I want to migrate the hard disks to the server. I have 5 x 2 TB HDDs (at the moment: 4 x 2 in RAID 5 and 1 x 2 as "worker" disk). Because the available space is less than 1 TB my wife allowed me to buy some new disks.

My plan is to buy 2 x 4 TB HDDs plus 1 x 2 TB HDD (here the question part starts).

The layout I have in mind is:

Zpool 1: 2 x 4 TB Mirror = 4 TB
Zpool 2: 2 x 3 x 2 TB (2 RAID-Z1 vdevs) = 6 TB

All together 10 TB. I use the storage for all kind of data plus storage for some VMs. Is this a good layout or should I use four mirrors without any striping? Or is there any other layout which is better?

Regards,
Markus
 
The Wikipedia article about RAID problems is a very interesting read. That article mentions an advisory to not use RAID 5 with high capacity drives, but RAID 6.

The ZFS equivalent of RAID 6 is RAID-Z2. However RAID-Z3 was developed to provide extra redundancy because of the increasing resilver time of the new multi-terabyte disks like those that you intend to use.

I do not have a lot of experience with ZFS yet, but tend to favour mirrored setups. If I would use a RAID-Z pool I would choose RAID-Z3. But maybe the the ZFS gurus have other suggestions ;)
 
Thank you.

Can anybody say something about the other parts of the questions? Which layout would best fit my needs?

Regards
Markus
 
Instead of creating a pool with two three-disk RAID-Z1 vdevs, I'd just bung all six drives in a RAID-Z2.

That way, any two disks can fail at once and your data are safe. Otherwise if two disks in the same vdev fail at the same time, your pool is gone.
 
Ah, that's a point.

But what's about the performance? Is there much difference between these options?
  1. 3 x Mirrors
  2. 2 x RAID-Z1 with striping (2 vdevs à 3 disks)
  3. RAID-Z2 with 6 disks
The outcome of storage would be the same. Just with option 3 two arbitrary disks can fail. When the performance is not (much) less than options 1 and 2, this would be the way to go.

Correct?

Regards
Markus
 
I'd try and get equal sized drives (e.g., 6x 4 TB or 6x 2 TB) and put them all in the same pool to reduce complexity and improve performance.

In terms of performance difference, it all comes down to how much of your data is writes and how big they are.

Striping will multiply your write throughput, thus the same number of drives (say, 6) in a stripe across 3x 2 drive mirrors will be a lot quicker (3x write speed) than those same 6 disks in a single RAID-Z2.

For any pool, the performance for write in terms of IOPs is roughly equal to the performance of a single drive per VDEV. Using 2 drive mirror VDEVs, you can fit more VDEVs in your pool for a given number of disks.

I will say that I have a personal bias towards stripes across mirrors (faster rebuilds, disk is cheap, etc), other opinions may differ. To do a stripe across mirrors, you simply add multiple mirror VDEVs to a single pool and ZFS strips across all VDEVs automatically.

However, for a single user desktop, you probably won't notice a huge performance difference in either case (mirror vs. RAID-Z), and any performance problems you discover could probably be easily fixed with an SSD for cache.

And yes, a single pool with more disks will probably give you more consistent, better performance/resiliency and less administrative overhead than trying to get too tricky with multiple pools.

Murphy's law says that if you split your storage up into two pools, you'll end up running out of space under one of the pools and not the other, and continually waste time playing storage administrator to shuffle data around from pool to pool.

Essentially all you're trying to do with multiple pools is get better performance for some data set, right? Just stick it all in one pool and let the filesystem figure that out in real time. ZFS will cache hot data for you itself, and all your data will get the performance of all the disks. If you need more performance, add some SSD for cache.

If you haven't read the ZFS Best Practices Guide (properly), I would strongly suggest having a good read and make sure you understand the implications before buying/building your system, as ZFS is quite different to other traditional filesystems.


edit:
Some of the info in that guide may be out of date or not FreeBSD specific (particularly the root pool restrictions), but the core concepts apply.
 
Wow, thanks for the detailed answer. As mentioned in the opener, I actually got 5 x 2 TB disks. My case takes 8 disks + 2 internal. So I could purchase three additional disks to expand the space. For the system I want to use my two Samsung 128 GB SSDs (830 + 840).

[SSDs]
60 GB system mirror
30 GB ZIL - mirror
30 GB L2ARC - mirror

[Harddisks: 8 x 2 TB RAID-Z2 = 12 TB]
1 big pool

Is this the way to go? The guide told me not to use partitions in production, but I think for my private NAS with some VMs it is ok. The cache/log partitions are enough? The system got an i7 2600k CPU and 32GB RAM.

12 TB are enough for the next time. How would be the expansion path? I read that I can replace VDEVs with larger disks. But here I just got one single VDEV. So no way to expand the storage by replacing disks?

Thanks
Markus
 
Sorry for bumping, but I need to purchase the stuff in the next week, so does anybody have a clue about the questions?

Regards
Markus
 
That setup looks perfectly fine for a home system to me. The RAID-Z2 gives a decent amount of space, and gives you additional redundancy for the time between a disk failing and it being replaced (which is often a much bigger time difference than in a commercial setting). With RAID-Z1/RAID-Z5 and a failed disk, you are entirely dependent on every other disk staying perfectly intact until rebuild is complete in order to avoid corrupt data, which is why no-one in the real world likes RAID 5 anymore (the size of disks these days mean the chance of that corruption happening is surprisingly high).

I'm not sure the partitioning of the SSDs is optimal though. A basic FreeBSD install with a few packages is only around 1-2[ ]GB so unless you have other plans for the system partition, 60[ ]GB may be over the top. Additionally you're better off not mirroring the L2ARC.

I would probably partition the SSDs as follows:

  • 30[ ]GB system
    20[ ]GB ZIL (you could probably even get away with 10[ ]GB)
    70[ ]GB L2ARC
Create a mirrored pool for the system, add the two ZIL partitions as mirrored ZIL to the data pool and then add the two remaining partitions as separate L2ARC devices, giving 140[ ]GB of total L2ARC space.

Regarding the expansion question. With a single 8 disk RAID-Z2, there are really two recommended expansion paths:

  1. Replace each disk with a larger one.
  2. Add a second RAID-Z2 vdev (ideally the same size - 8 x 2[ ]TB)

There are users on here who have mismatched pools (e.g. RAID-Z2 vdev and a mirror). It's not generally recommended but as far as I'm aware it works fine. I think the recommendations are based more on Sun's best practices than actual limitations - i.e. they didn't want to find themselves having to support performance quirks from customers with weird and wonderful pool configurations.
 
Hey, thanks for the detailed answer.

The SSD-Partition-Layout looks good for me.

Can you go a little deeper, why not to mirror L2ARC? It is unnecessary, because its just a cache I think and no critical data is cached, isn't it?

Regarding your second "expansion-option": Do you mean to stripe those two sets then? Then the speed rises up and I just need an expansion case. If its needed, the prices for 8x2 TB should be affordable "en bloque".


Edit: Is there any good solution to migrate from a Synology RAID to ZFS? I don't think so. I would have additional storage for the migration, and if I would have just one external 4 TB "migration drive" I have to pray, right?

Thanks and regards
Markus
 
storvi_net said:
The system got an i7 2600k CPU and 32GB RAM.

What about other components ? Particularly motherboard and HBA (controller). I would pay attention to this as you may hit a bottleneck there too.
 
I will purchase an IBM 1015 and flash it to the LSI-Firmware. As network cards I got some Intel cards.

I think this should be enough for a fast home NAS.
 
storvi_net said:
I will purchase an IBM 1015 and flash it to the LSI-Firmware. As network cards I got some Intel-Cards.

I think this should be enough for a fast home NAS.

Sure. My 2¢ here is I'd lean towards more robust solution (RAID-Z2 or RAID-Z3 even with spare) than speed. Usually you don't have a backup of your home storage (and may have some personal data to keep).

Your controller is PCIe 2x which means it can serve ~500[ ]MB/s per lane (disk). SATA3 disks can go as high as 600[ ]MB/s. Depending on your CPU/motherboard, you may hit some bottlenecks there too.
 
Yes, but my "old" disks are not so fast and by combining these with a ZIL and L2ARC of the SSD it should be enough. The backup of the most important data will be done to a server out of my home, so that in case of a fire / water ... the most important stuff could be recovered.

Thanks for your input - I will tell about my NAS when I got all the parts

Regards
Markus
 
For a home NAS I wouldn't be massively concerned with performance. You have an i7 and 32GB of RAM which will handle compression fine and give a decent amount of ARC. I really wouldn't worry about PCI/Mainboard bandwidth - you're obviously going to be limited by networking for its use as a NAS. (Speaking of compression, the recommendation is to pretty much always enable it)

Regarding L2ARC, yes, the loss of an L2ARC device does not affect the pool at all. It'll just go back to the disks if you request data that was previously cached. I'm not even sure it's possible to mirror L2ARC so you'd have to do it with gmirror, then use the /dev/mirror/x device for the L2ARC, which is just messy. By using the two SSDs separately, you double the amount of data that can be cached.

If you expand by adding a second set of 8 disks, then yes, the data is striped across both vdevs and you get twice the space. You probably won't actually get twice the performance though. If you start with two vdevs, all the data is fully 'striped' across the two. However, if you start with one, fill it, then add another, the majority of new data is going to end up on the new vdev (because there's very little space left on the original one) so you won't actually get the full performance of striping across the two.

For moving data, really just do what works to get the job done. If the existing NAS can backup with rsync, use that and set up an rsync server on the new one. You may be able to just drag and drop data between the two using a client computer (although possibly the slowest option). Alternatively, just copy it all from-and-to using a standalone disk as mentioned.
 
Thanks for other good advice.

I am just thinking about the best to do and while doing this, I thought about purchasing 6 x 4 TB disks for a RAID-Z2 with 16 TB. Then I could migrate without any hassle and sell my NAS with 5 x 2 TB. The advantage would be, that I got more capacity and I got enough SATA ports to expand to another 6 x 4 TB RAID-Z2 vdev for striping and will have 32 TB.

Combined with the two SSDs for ZIL and L2ARC it must be enough for home usage.

So the next step is to find good 4 TB disks and buy six of them.

Regards
Markus
 
Hello again,

I changed the plan a little, so that I now have 6 x 3 TB disks, which is enough for the first time and is going to be cheaper when I decide to double the space by buying six identical harddisks. I also got the IBM Controller, which I have to flash during the next days. I will put FreeBSD on the two SSDs in a ZFS mirror applying the sizes you recommended above. It is neccessary to check all the six harddisks with a checktool, when I use ZFS? If yes, which one can I use before I will create the vdev?

Regards
Markus
 
storvi_net said:
It is neccessary to check all the 6 harddisks with a checktool, when I use ZFS? If yes, which one can I use before I will create the vdev?

The main thing you want to do is to run some I/O through each disk to see if it fails. Generally, HDDs either fail early due to defects or have a long lifespan. Doing a bit of stress testing early on allows you to replace any defective drives during their warranty period.

You have a few options. Regardless of the method, you should check the SMART data before and after stressing the drives to check for errors. The simplest method is to use dd to fill each drive with zeros. You could also use one of the manufacturer diagnostic tools to run a stress test. And if you want to go all out, you could run badblocks which writes data to the disks and checks to make sure it wrote correctly. This one takes a while if you do a few passes, but it'll definitely stress the disks and any defects should show up on a SMART scan.

Once you are satisfied, you might want to run bonnie++ on the drives and the zpool to check I/O performance.
 
Run smartctl -t long drive first. If it passes with no new reallocated sectors, then try filling it with zeros with dd(8). Be sure to use a 64K or larger buffer so it does not take forever. If there are no new reallocated sectors after that, run it. If it lives past the first month, it is likely to go for years.
 
Back
Top