ZFS Recommendation for 100TB raid-z pool

Yes but ZFS allows you to combine the storage part of the services. One pool for all.
And that is achieved by just having one dataset per service on the pool. eg: storage/nfs, storage_smb and so on, right?
But I feel like what you're "implying" is that there is a way to have nfs and smb just use the same dataset? Could you elaborate on that?
 
No I don't think that is wise. Separate datasets on the same pool for each export method.
Also consider file locking consequences. ZFS might not act like you think with several people editing the same docs.

I am really speaking out of my trade here. These are just the basics you need to consider.
 
I need to provide a network drive for an SME with 100 TB usable capacity. I would like to use a ZFS raid-z3 pool consisting of 14x 12TB SAS or SATA drives (this will be a rare-write/many-reads situation).
I did this 5.5 years ago and wrote about it here. While some of that information is a bit dated, you may find it useful. In particular, it considers backups and replication which are often afterthoughts in these projects.

I would definitely advise you to use SAS drives. Particularly if you're going to use either backplanes with expanders or standalone expanders.

It you want to cast a wider net, this type of thing is often discussed on the Serve the Home (a bit of a misnomer these days) forums.
 
Terry_Kennedy I really found this comment helpful. Sorry to crosspost your comment. I linked this above.
"There's a lot of folklore around ZFS, such as using prime numbers of drives for raidz1, prime+1 for raidz2, etc."

Do you still agree with this? I see alot of 6 drive or 10 drives vdevs in that post but what about 7 or 8 drive arrays.
Are they really that much worse off?

Your comment seems to be what I want to hear. Lots of folklore out there.
 
Do you still agree with this? I see alot of 6 drive or 10 drives vdevs in that post but what about 7 or 8 drive arrays.
Are they really that much worse off?
I did various tuning when I first set up the 'zillas, but as I mentioned that was 5.5 years ago. Things have changed in the meantime (a couple of major FreeBSD kernel versions, switching to OpenZFS, etc.). I'd hesitate to say "this is the one true way".

Having said that, I'm still running the 'zilla 2.5 systems, on FreeBSD 12.3-STABLE these days, and they are capable of sustained reading from or writing to their disks at something like 700Mbyte/sec over 10GbE with Samba. I think that means I've hit the point of diminishing returns regarding further ZFS tuning.
 
I'd do some more reading before settling on it.
Yes. Just in 14 drives we are talking $6K USD
Here is a SAS drive that I might buy.
WUH721816AL5204
Ok $415x14=$5810 from this random place.
Two spares puts you up around $6.8K.
Then add in $4K chassis we are past $10K.
How much is 256GB DDR4? Zoinkers. ZFS needs lots of RAM.
Here is a nice chassis:
Looks like it uses an expander backplane though.
 
Yes. Just in 14 drives we are talking $6K USD
Here is a SAS drive that I might buy.
WUH721816AL5204
Ok $415x14=$5810 from this random place.
Two spares puts you up around $6.8K.
Then add in $4K chassis we are past $10K.
How much is 256GB DDR4? Zoinkers. ZFS needs lots of RAM.
Here is a nice chassis:

5.5 years ago when I built the RAIDzilla 2.5 systems everything except the drives came to $2815. That's with eBay parts, a few of them used. If these are being built for a business, that might not be acceptable. However, with 6 'zilla 2.5 systems running 24/7 over 5.5 years, I haven't had any failures of anything whatsoever.
Looks like it uses an expander backplane though.
That isn't necessarily disqualifying. Spinning rust media isn't going to come close to saturating a 12G SAS port. I didn't switch to expanders on my newer 'zilla builds because there wasn't enough of a cost savings to justify having different types of hardware. If I was building systems with more than 16 drives I might have reconsidered. It looks like Supermicro still sells versions of the same SC836 chassis I used, which should be more than sufficient.

As far as disks go, most of my 'zillas were built with disks purchased new from authorized distributors. The last 2 were built with surplus drives, mostly because the He8 drives I was using were no longer available new. As I said, no failures of anything (including drives).

This eBay listing is for used 10TB HUH721010AL4200 drives pulled from Cisco systems for $119.95 each. I've never dealt with that seller, but they've apparently sold over 3200 of them and have at least another 9000 on that one listing. As I said, you need to evaluate the tradeoffs between cost and potential reliability / warranty concerns.

If the customer balks at the pricing for new components, just have them configure a Dell Powervault NX ($23K+) or something similar from any other integrator.
 
Here is a nice chassis:
https://www.wiredzone.com/shop/prod...bone-dual-processor-6527?page=20&category=108
Looks like it uses an expander backplane though.

In the SuperStorage 6049P-E1CR24L sits Supermicro's BPN-SAS3-846EL1 (=expander). Together with its sibling backplane BPN-SAS3-846EL2 (containing 2 expander chips) this seems to be their standard for the (front) wall of a 4U chassis. Their TQ backplanes, like in this chassis: SuperChassis 846TQ-R900B have individual connectors for each disk. (I wouldn't want to wade through such a cable-forest in case of cable problems though.)
 
Keep in mind that ZFS stripe width is Fraught:
That is an often-cited article (probably because of the flashy title ;)), but I found this one—dealing with the same topic—much more informative and illuminating:
 
Their TQ backplanes, like in this chassis: SuperChassis 846TQ-R900B have individual connectors for each disk. (I wouldn't want to wade through such a cable-forest in case of cable problems though.)
In addition to the expander (BE) and individual (TQ) backplanes, they have the BA multi-lane backplanes. That gets you down to 4 cables. Those may no longer be a "standard configuration" shown on their website, but do seem to be orderable from distributors (for example, CSE-836BA-R920B).
 
this seems to be their standard for the (front) wall of a 4U chassis.
Agreed. The current batch of SuperMicro 3.5" drive 24-bay chassis all seem to use an expander.
I note some used Chenbro 24 bay chassis on ebay for cheap. They are a good rack server brand.

Spinning rust media isn't going to come close to saturating a 12G SAS port.
I agree its just the thought of spending that much and getting an expander.
With three SAS9400-16i cards you get 48 drives.
I see no need for an expander versus three controllers until that point.

One intresting new trend is top loading disks on big disk packs.

I went through a phase 8 years ago in a quest to build my own megaNAS
Bought SAS expanders from Astek, Chenbro and Intel while building my own chassis from parts.

That experience brings me to the negative comments about SAS expanders.
They work but they are not cheap and not fast.The are the throat of a funnel.
External cables cost so much too.

I also have a Chenbro 24 bay-2.5" chassis and I like the layout. 12 drives per backplane.
Running three LSI 3008-8i with SFF-8643 to SFF-8087 cables. 6 Cables is all. I did buy different lengths to suit.
 
That is an often-cited article (probably because of the flashy title ;)), but I found this one—dealing with the same topic—much more informative and illuminating:
There is no one generic "best" configurations - those articles serve as reasonable guidelines to start with, but if maximum performance is a requirement, various configurations will need to be tried to find out what works best on a specific set of hardware and data.

Having said that, it is quite easy to get something that is "more than good enough" pretty much out-of-the-box. As I mentioned, my 'zillas do > 700Mbyte/sec. It is possible to set up hardware that operates in the gigabytes/second range (and I have done so) but that is massive overkill for the majority of use cases.
 
I agree its just the thought of spending that much and getting an expander.
With three SAS9400-16i cards you get 48 drives.
I see no need for an expander versus three controllers until that point.
The use case mentioned in the OP shouldn't need anywhere near that number of drives, fortunately. While 8TB drives were still rather cutting edge when I built my 'zillas, these days drives of up to 20TB each are available.
One intresting new trend is top loading disks on big disk packs.
Some years ago on eBay there were a large number of 60-drive enclosures from (IIRC) HGST. Orientation in other than the "classic" label-on-top is feasible as long as the drive manufacturer explicitly supports that configuration. Not all drives do. Dell had some desktops that mounted the drives connector-on-top and depending on the brand / model of drive, it could be problematic. For any other greybeards here - Atasi 3046

There is also the issue of vibrational interference between drives, which can vary depending on the mounting orientation. There's a classic YouTube video showing that simply yelling at disk drives can slow them down.

If anyone wants to bring up Backblaze Pods, I'll just say that those are engineered for maximum capacity at lowest cost - when the data travels over the Internet, array performance isn't really relevant.
I went through a phase 8 years ago in quest to build my own megaNAS
Bought SAS expanders from Astek, Chenbro and Intel while building my own chassis from parts.

That experience brings me to the negative comments about SAS expanders.
They work but they are not cheap and not fast.The are the throat of a funnel.
Mounting random SAS expanders and cabling them can be an exercise in frustration (and usually is). Expander backplanes are much better-behaved.

The one caveat is SATA drives behind SAS expanders. That pushes the SAT layer from the controller into the expander, and things are a lot less well-tested there. A common problem is an expander hitting an I/O error on one SATA drive and going "Resets for everybody!!!" and all the host sees is a bunch of drives dropping offline at the same time for no apparent reason. Dell will sell SATA drives that plug into chassis with expanders, but they put a SAS interposer in the back of each individual drive tray to prevent the problem I mentioned.
 
That experience brings me to the negative comments about SAS expanders.
They work but they are not cheap and not fast.The are the throat of a funnel.
I mostly disagree, but partially agree. For large configurations, and for efficiency, expanders are simply necessary. Buying enough HBAs and cabling to give each disk drive its own SAS lane is expensive, power-hungry and wastes PCIe slots. When engineered well, expanders are not the bottleneck.

However, expanders are yet another moving part in the system, and they add complexity. Expanders with crappy firmware are deadly, and making debugging storage enclosures into sheer hell. Ask me how I know (or actually don't ask please). If you want to build large, reliable systems with expanders that operate under high load with real-world errors, it really helps to get OS, HBA, expander and disk drive engineering teams to work together. Or else, power cycle often.

There is no one generic "best" configurations ...
Absolutely! If one wants to seriously optimize performance or cost or efficiency or ... then one has to do some good planning and some trial and error.

As I mentioned, my 'zillas do > 700Mbyte/sec. It is possible to set up hardware that operates in the gigabytes/second range (and I have done so) but that is massive overkill for the majority of use cases.
My personal record for a single host was 18 GB/sec, sustained from disk drives through the CPU to Infiniband. It takes lots of work to get there.

Orientation in other than the "classic" label-on-top is feasible as long as the drive manufacturer explicitly supports that configuration. Not all drives do.
About 6 or 8 years ago, we checked with the big disk drive vendors, and at the time all drives could be mounted in any orientation that was parallel or perpendicular to gravity, while diagonal was probably OK but untested.

There is also the issue of vibrational interference between drives, which can vary depending on the mounting orientation. There's a classic YouTube video showing that simply yelling at disk drives can slow them down.
Absolutely true. And the effect is more pronounced for cheap drives (the SATA drives from 15-20 years ago), which were not very tolerant of vibration. This is why disk enclosures actually need to be engineered by people who understand mechanical engineering; you can't just bolt disk drives to a random piece of aluminum and expect them to work.

The one caveat is SATA drives behind SAS expanders. That pushes the SAT layer from the controller into the expander, and things are a lot less well-tested there.
This is a perfect example of: If you use expanders, you need to use expanders with good firmware, and you need to make sure the whole stack firmware/software works well together.
 
About 6 or 8 years ago, we checked with the big disk drive vendors, and at the time all drives could be mounted in any orientation that was parallel or perpendicular to gravity, while diagonal was probably OK but untested.
That doesn't surprise me - in order to meet modern performance specs, things need to be pretty well balanced within the drive. The days of a giant unbalanced head driven through a metal ribbon from a stepper motor are long gone.

The Atasi drive I mentioned was the only 5.25" ST506 drive ever made with a linear voice coil. That drive definitely wanted to be level.

For amusement, this is an IBM mainframe drive from 1985 (paperback book on top for scale). A whole 5GB of storage. And only 3600 pounds!
 

Attachments

  • PXL_20211112_204708853 - Copy.jpg
    PXL_20211112_204708853 - Copy.jpg
    206.2 KB · Views: 91
That is an often-cited article (probably because of the flashy title ;)), but I found this one—dealing with the same topic—much more informative and illuminating:
"Flashy" is one way to describe it. "Authoritative" is another:
The ZFS storage system—as I’ve been working on ZFS and OpenZFS basically my entire career. I helped create ZFS at Sun Microsystems back in 2001 and established the OpenZFS open source project in 2013.
 
Where are you planning to backup those 100TB?
Some combination of replication and backup will likely be needed. Here are the sections of my 'zilla articles discussing this:
RAIDzilla II
RAIDzilla 2.5

for those that don't want to read the whole sections, the executive summary is replication to identical hardware at another site with sysutils/zrep + BBCP and regular backups to tape using a LTO6 library and archivers/gtar. The scripts I use are in the linked sections.
 
I need to provide a network drive for an SME with 100 TB usable capacity. I would like to use a ZFS raid-z3 pool consisting of 14x 12TB SAS or SATA drives (this will be a rare-write/many-reads situation).

I have been looking at chassis like the SuperMicro SC846BE1C-R1K03JBOD. Given that I'll run ZFS on this I don't want a raid controller but merely an HBA. Supermicro lists the AOC-SAS3-9300-8E as a supported HBA.
I could not find this exact model number listed in the hardware notes of FreeBSD 13.0. Is that gonna be a problem?
Is there any reason to believe that an LSI HBA officially supported by FreeBSD would not work in a chassis like this?

Now, I never worked with those JBOD chassis before. As I understood, I slap in the HBA, tons of drives and connect this chassis to an application server. Does something like this work out of the box? Will the application server (also FreeBSD 13.0) just see those drives as individual drives and I can create a ZFS pool like I am used to with "local drives"?

How does the (physical) connection to the application server work? Do I just add another HBA with external SAS ports to it, connect it to the external SAS ports of the HBA in the JBOD chassis and that's it?

Anything else you'd like to share in terms of advice, experience or similar?

You can check my attempt at 1000TB here:

From today's available options I would dig the DRAID option as it seems to get more IOPS with the same amount of disks also having to use less spares (I would use two instead of three with RAIDZ3) and faster rebuild/resilver times.
 
First, I'm going to admit that I'm not an enterprise storage guy but a few things that you might want to think about.
Is cost a concern? If not just buy from directly from vendor Fujitsu, Dell etc what you need it'll save you a lot of time and no need to worry about on site repairs etc.
Your workload and array "design" are also topics you need to look at.

....but if you want go down the DIY route:

Chassis: Silverstone (tek) RM316 / SST-RM316 (might be EOL), SuperChassis 836BA-R920B, iStarUSA EX3M16 - All these have "Direct connect" backplance as far as I can tell
Motherboard: Tyan S5560GM2NRE-2T-HE or something else that uses Intel's C256 chipset
CPU: Intel E-2374 , E-2386 or similar (depending on motherboard you might need a CPU with integrated graphics)
HBA: 2x LSI2008 based HBA (or better), you don't really need a super beefy one as most HDDs rarely does more than ~180Mbyte/s in practice anyway

One of two NVMe drives for boot and/or redudancy and additionally (optional depending on workload) 2x (Intel Optane or similar) for SLOG off a PCIe card such as Gigabyte CMT4032
HDDs: Toshiba MG08/09 series, given your requirements I'd say 14TB / HDD or more
RAM: 64Gb or so (depending on workload)

This should provide you with out some headroom for both a z2 or z3 array without breaking the bank so to say :)
Keep in mind that current version of Samba in ports does not support AES-NI if you plan to use that so it'll be quite CPU intensive.
 
I would dig the DRAID option as it seems to get more IOPS with the same amount of disks
Yes this seems to be an interesting new option on FreeBSD 13 and OpenZFS.


This is nice to see for the original poster. Straight from your website.

Code:
# zfs create                      nas02/nfs
# zfs create                      nas02/smb
# zfs create                      nas02/iscsi
# zfs set recordsize=4k           nas02/iscsi

Maybe seeing it in writing will help him. Trying to explain datasets is not as easy as I thought.
 
[...] Because of the SAS cable connectors your physical placement options in relation to your server where the HBA & server is located, is limited. Supermicro provides cables with two SFF-8644 connectors at both side; lengths of 1, 2, and 3 meters: SAS external cables. I'm not sure if all those lengths are capable of SAS3 speeds[*].
[...]
___
[*] You could also consider to contact Supermicro Europe in the Netherlands with technical questions that you might have and perhaps alternative chassis. No FreeBSD specific drivers I'm afraid.

There seem to be various types of external SAS cables (passive CU, active CU & optical). Given:
the (passive CU) cables offered by Supermicro seem capable of SAS3 speeds.
 
Back
Top