Some hand-holding for storage server...

jsmastering · Jan 12, 2019

I need to replace an aging synology NAS, and it would be nice to add a bit of functionality.

I don't need that much storage. 5-10 TB is fine. Partially due to noise, I'm thinking of going all-SSD or a flash pool and a rust pool. The business is a home-based music mastering studio.

My use case is similar to SOHO file server. A 10Gb networking upgrade will come at the same time, probably dedicated to storage traffic. Clients are a mix of Win 10 Pro (studio computer), FreeBSD and Linux (various dorky things), and OS X (Time Machine for my wife).

If it's possible, I'd really like to boot my studio machine (win 10 pro) off the server, mostly for snapshots and data protection. I'm under the impression that a 10Gb SFP+ card that supports iPXE along with a zvol exposed via iSCSI will enable this, but I haven't tried it before.

If that's not possible, speed isn't much of a concern (and hardware choice is much simpler). If it is, I'd like to get performance close to a SATA SSD over the network (certainly for iSCSI, close to it for NFS, etc.).

I'm torn between a whitebox build and finding something pre-built that's viable.

The WhiteBox I came up with was:

SuperMicro X11SDV-4C-TP8F (xeon d 2123IT; dual SFP+; etc.)
32 or 64 GB of their RAM
drives (10x1TB WD Blue for data, pair of 120GB for OS)
Norco 2212 Case
psu, etc.

The total cost is under 4000USD.

After reading Calomel's Article, it seems like I probably need an HBA/RAID card rather than relying on motherboard SATA ports. I do have an older m1015 sitting around. But, it's only for 8 drives. I was planning on 12. SAS Expanders seem like they're not going to solve that problem without sacrificing speed. Is it actually just a matter of multiple HBAs being necessary?

In short, it seems like I don't know what I need unless that particular SM board performs better than Calomel's test.

I'm definitely not opposed to just buying something off the shelf. But, it would have to be at a similar price point. That doesn't seem possible (maybe there's a reason) without going with aging hardware. I'm not sure I want a 10-year-old server (e.g., R720XD) by the time the drives are reaching the end of their warranty. I've never kept a computer I cared about that long.

I've contacted iX Systems, but that conversation is taking a while to even give a ballpark price. None of the other server vendors I've looked at can come close.

So...anyone want to hold my hand through this? It'd be nice if I could email a beer in thanks.

Bobi B. · Jan 12, 2019

Why WD Blues? They are desktop hard drives; not sure if they will be a good fit for a storage server. WD Reds are for up to 8-bay systems, Red Pros are for up to 24 bays. Besides you'll be wasting money for 1TB drives.

How much total disk space and how much protection you plan to have?

ondra_knezour · Jan 12, 2019

The motherboard you selected has U.2 ports for two devices, so this is your solution for fast OS SSD.

If you would insist on SAS ports, you are probably out of luck with embedded CPU and Mini-ITX factor Supermicro offer-wise, but it can be easily solved using add-on card. However with your preferred disks I would call it waste of money. Here I would at first ask myself what is optimal number/topology for chosen storage redundancy (may be for example 2 VDEVs 4 or 6 disks each) and then buy biggest disks I can afford. And I would not consider disks not designed for 24/7. Also we were said by WD sales representative that various series (or colors in their range of products) have hardware/firmware tuned for specified task, for example server grade disks have accelerometers and firmware accounting for signal jitter from vibrations coming from other disks and cooling fans in server racks, but you know sales people. I would just stick with something statistics says fail not so often.

ralphbsz · Jan 12, 2019

Buy fewer disks. You did not give an extreme throughput requirement that justified having this much disk bandwidth, or this many random IOs. Ten disks are 10x more unreliable than one. For redundancy, you need at least two. With your requirememt, the psychological sweet spot is probably 2ea 10TB drives in a 2-way mirror. Personally, I would use 3-way mirroring (single-fault-tolerant RAID is too risky today), but it's always hard to convince people of that. File system: I would definitely go with ZFS, because of checksums and built-in RAID (which has fundamental advantages that a RAID controller can't touch, mostly the RAID code being aware of file layout when doing rebuild and scrub). I like ondra_knezour's idea of using U.2 drives for the boot and OS drive.

You do not need SAS for that; there are perfectly good enterprise-grade nearline drives with SATA ports too. Matter-of-fact, the same nearline drives are available with SAS and SATA, at similar price points. And SAS expanders don't kill performance; I've worked on machines with 360 SAS drives all running full out, and the SAS expanders handle that just fine. Personally, my preference would be Hitachi nearline 7200rpm drives.

For reliability, you need good cooling. Don't skimp on case, fans, and power supplies.

As to SSD: If your capacity needs are really 10TB, then you'll need at least 20TB of flash for redundancy. That's real money for an amateur. If you have the money, go for it.

jsmastering · Jan 12, 2019

Bobi B. said:
Why WD Blues? They are desktop hard drives; not sure if they will be a good fit for a storage server. WD Reds are for up to 8-bay systems, Red Pros are for up to 24 bays. Besides you'll be wasting money for 1TB drives.

How much total disk space and how much protection you plan to have?

The WD Blue SSDs, not HDDs.

Unless I'm mistaken, there aren't the same NAS/array-based issues with using them as there are with HDDs. I picked them due to price easily fitting in the budget vs. their TBW stats and warranty. At least based on my current data usage patterns, I can expect them to last >3 years. At which point, SSDs will probably have changed, perhaps for the better or at least for the cheaper. That's also why 1TB drives. They hit the right price/space ratio. Samsung 860 Pros would double the cost of drives for significantly better write endurance but not significantly longer warranty. It didn't seem worth it. Outside of enterprise class SSDs, I'm not aware of anything better.

SSD price seems to scale linearly with capacity at the moment. So, 2TB SSDs would double the price. I don't need to spend that money, and I doubt I'll need that much space in the next 3 years.

I was planning on 10 of them (either a big z3 or a pair of 5-disk z2s) for 6-7 TBish usable, plus a 2-disk mirror for the OS...initially using the 12xSATA connection available on that motherboard (4 via SATA connectors, 8 via jumper + mini-SAS to SATA cables).

ondra_knezour said:
The motherboard you selected has U.2 ports for two devices, so this is your solution for fast OS SSD.

There is a jumper on the motherboard to use them as 4xSATA ports using a mini-SAS cable. I was not planning on using U.2 drives, and I don't really see the point of particularly-fast OS storage in this case (a mirror seems fine).

ondra_knezour said:
If you would insist on SAS ports, you are probably out of luck with embedded CPU and Mini-ITX factor Supermicro offer-wise, but it can be easily solved using add-on card. However with your preferred disks I would call it waste of money. Here I would at first ask myself what is optimal number/topology for chosen storage redundancy (may be for example 2 VDEVs 4 or 6 disks each) and then buy biggest disks I can afford.

I can afford much bigger disks than I could ever fill up in their design life, if you're talking about hard drives. I'm also under the impression they won't perform well enough to boot the studio computer off it. With SSDs, there's a tradeoff. I outlined my thought process above.
I've also read multiple articles saying that other than obvious exceptions, the number of drives in a vdev isn't a big deal. Is that not the case?

ondra_knezour said:
And I would not consider disks not designed for 24/7. Also we were said by WD sales representative that various series (or colors in their range of products) have hardware/firmware tuned for specified task, for example server grade disks have accelerometers and firmware accounting for signal jitter from vibrations coming from other disks and cooling fans in server racks, but you know sales people. I would just stick with something statistics says fail not so often.

If you have a source for that information for SSDs rather than hard drives, I'd love to read it. My personal experience with SSDs is ~12 disks since around 2006 and few brands, and shows no differences and generally better reliability than HDDs. But, that's my personal luck more than anything conclusive.

I'm not aware of any NAS-focused consumer SSDs. I absolutely cannot afford enterprise SSDs. That price jump is ridiculous. If it's not viable to use consumer SSDs in a NAS, I'm probably going to scrap the whole project and just replace the HDDs in my synology with new version of the same drives and revisit it in a year or two.

ralphbsz said:
Buy fewer disks. You did not give an extreme throughput requirement that justified having this much disk bandwidth, or this many random IOs. Ten disks are 10x more unreliable than one. For redundancy, you need at least two. With your requirememt, the psychological sweet spot is probably 2ea 10TB drives in a 2-way mirror. Personally, I would use 3-way mirroring (single-fault-tolerant RAID is too risky today), but it's always hard to convince people of that. File system: I would definitely go with ZFS, because of checksums and built-in RAID (which has fundamental advantages that a RAID controller can't touch, mostly the RAID code being aware of file layout when doing rebuild and scrub). I like ondra_knezour's idea of using U.2 drives for the boot and OS drive.

Can a 3-way mirror of 10TB disks over 10Gb ethernet rival the performance (for network boot) of a SATA SSD? If it can, I'll consider it. Otherwise, that is not what I want.

I really don't understand why U.2 drives for booting the storage server matter. Is there something going on I don't know about, or are you perhaps assuming the server will be used for anything other than storage? It won't.

ralphbsz said:
You do not need SAS for that; there are perfectly good enterprise-grade nearline drives with SATA ports too. Matter-of-fact, the same nearline drives are available with SAS and SATA, at similar price points. And SAS expanders don't kill performance; I've worked on machines with 360 SAS drives all running full out, and the SAS expanders handle that just fine. Personally, my preference would be Hitachi nearline 7200rpm drives.

ralphbsz said:
For reliability, you need good cooling. Don't skimp on case, fans, and power supplies.

I wasn't planning on it. Though actual information on what is decent and what isn't is hard to find. There are a lot of people using those norco cases with better fans with good results. I'm open to advice about a redundant power supply ideally under $500.

ralphbsz said:
As to SSD: If your capacity needs are really 10TB, then you'll need at least 20TB of flash for redundancy. That's real money for an amateur. If you have the money, go for it.

Is there some reason I'm not seeing to only use mirrors vs. z2/z3?

It would be possible to use a 3-disk mirror of 2TB SSDs for the network boot, a 3-disk mirror for server boot, and then use HDDs for other bulk storage. But it seemed simpler to just go all-flash. I'm pretty sure the total drive costs were comparable when I looked at doing that.

ondra_knezour · Jan 12, 2019

jsmastering said:
The WD Blue SSDs, not HDDs.

Oh, I somewhat omitted this and my comment was more related to the spinning HDDs, sorry.

jsmastering said:
I was not planning on using U.2 drives, and I don't really see the point of particularly-fast OS storage in this case (a mirror seems fine).

You may put not only OS on a such mirror, but also L2ARC and SLOG for faster read and write, which may make spinning disks solution acceptable speed-wise.

jsmastering said:
I've also read multiple articles saying that other than obvious exceptions, the number of drives in a vdev isn't a big deal. Is that not the case?

Here I have to admit that I am doing "as always" for years based on some blog articles about ZFS tunning by authors/Solaris developers, which are mostly long gone (as Oracle removed those blogs from internet). It is well possible, that it is not critical/very important any more, but there is some logic in it - basically you still have couple of data parts and parities which you want to spread over multiple devices in somewhat even way.

jsmastering said:
If you have a source for that information for SSDs rather than hard drives, I'd love to read it.

I am not avare of any and not because I didn't search :/

jsmastering said:
Can a 3-way mirror of 10TB disks over 10Gb ethernet rival the performance (for network boot) of a SATA SSD? If it can, I'll consider it.

Hard to say. I just googled some tables putting sustained sequential read for SATA drive at about 150-170 MBps. Let say 150 x 8 bits per byte is 1200, so you would need little more than eight of them to fully utilize 10 Gbps interface. I ignore base 1000 vs. 1024 difference etc. here because I would consider other aspects very important. How fast can really ethernet device send data out? Will OS cope with such speed? NFS/CIFS/iSCSI, what is quality of their implementation you have on hand? And what is overhead of those protocols? And so on. So I would try to setup some benchmarks, spinning disks, spinning disk + fast NVMe devices for L2ARC, all-flash and see. I case this is not available I would settle with feeling that HDDs + fast cache should do well.

jsmastering said:
I wasn't planning on it. Though actual information on what is decent and what isn't is hard to find. There are a lot of people using those norco cases with better fans with good results. I'm open to advice about a redundant power supply ideally under $500.

I don't know what your space requirements and power constraints are, but did you consider something like this? We got one loaded with 4 TB Toshiba disks and all the bells and whistles for about 5-6 000 in USD. It was about two years back?

jsmastering said:
Is there some reason I'm not seeing to only use mirrors vs. z2/z3?

Generally this. In case of failure, your may experience another one before you replace failed device and data are gone.

ralphbsz · Jan 12, 2019

jsmastering said:
Samsung 860 Pros would double the cost of drives for significantly better write endurance but not significantly longer warranty.

For most commercial users (whose time is not free, and whose data is way more valuable than the hardware they store the data on), the warranty on drives is irrelevant. If you lose a $200 drive, you can throw it in the trash, or you can spend 10 hours of your time arguing with vendors and manufacturers to get a used $100 drive as a warranty exchange, but at $200 per hour for your time that is not a good investment. What matters is that you didn't lose the $20K worth of data, because you had redundant copies elsewhere. Where warranties matter is either for consumers who have ample spare time (retirees?), and for large companies: If Dell/EMC/HP/IBM/Amazon has accumulated 20 pallets of prematurely failed disks, which have all gone through in-house post-mortem testing, and returns them in a big truck to Seagate or Hitachi or WD, they do get a few million $ back. And that money is a powerful factor in the next price negotiations with disk vendors. One time in my professional career I helped organize returning over 1000 drives (a few weeks old, all from a single manufacturing batch, with an insanely high failure rate) to vendor X, and vendor X than gave us about 2x the purchase price for those drives, so we could buy equivalent drives from vendor Y and give them to our customer. Several M$ changed hands, and I think a VP of quality control at vendor X is now unemployed (sadly, he deserves it).

Next topic: write endurance. Please measure your write traffic, and compare it to the published write endurance of the drives you are considering. Most people will NEVER get anywhere close to the write traffic that causes write endurance problems. On the other hand, if you are thinking that you might get there because you really are overwriting data that fast, then there are two solutions: either you consider your SSDs to be disposable (short-lived, just replaced them when the write endurance is exceeded, consider them a consumable), or you go for enterprise SSDs (where you pay a huge premium for not having to replace them).

There is a jumper on the motherboard to use them as 4xSATA ports using a mini-SAS cable. I was not planning on using U.2 drives, and I don't really see the point of particularly-fast OS storage in this case (a mirror seems fine).

It's not about fast, it's about convenient. If you were building a hard-disk based server, then having your OS and boot on SSDs is still a good idea, because your machine comes up much faster. And the U.2 drives are physically tiny, use little power, and not very expensive.

There are two viewpoints on whether to mirror the boot/OS drive. One is: don't bother; if they fail, you can just reinstall the OS in a few hours. Nothing on the OS disk is valuable, it only costs time. The problem with this theory is: In reality, the OS install never goes quite as flawless, unless you have OS backups on other media (with local customization!), or you are really organized about recording all local customization so they are easy to redo. The other viewpoint is: mirror them, because then if one fails, you don't have to waste time reinstalling, and then you buy a new one for re-mirroring. Clearly, the balance between these viewpoints depends on the cost of your time, versus the cost of downtime.

I've also read multiple articles saying that other than obvious exceptions, the number of drives in a vdev isn't a big deal. Is that not the case?

It is a giant deal.

Let's start with data reliability. You need some redundancy, meaning at least 2 copies of the data. With the size of modern drives, as ondra_knezour already said, single-fault tolerance is no longer sufficient, since the probability if finding a second fault when repairing the first dead drive is now high. So you need at least 3 copies, or the equivalent number of copies spread over more drives (using parity-based codes, such as RAID-Z2 and Z3). Personally, I'm actually still running with just a 2-way mirror, but I also have ZFS which is really good at recovering after a double fault (it will typically destroy only one file, not the whole RAID array), and I have another 2 copies in backups, one of which is never older than two hours, and one is off-site for disaster recovery. For a good reliability server, you need at least 3 drives.

Now, if you have exactly 3 drives, you will be storing 3 identical copies of the data (mirroring), and your efficiency overhead will be 200% (for every byte stored, you have another 2 bytes of redundancy). If you have 12 drives and use RAID-Z2, then it will store 10 drives' worth of capacity (the extra two are redundancy), and your overhead is just 20%. So more drives gives you better space efficiency, at the same redundancy.

But: more drives also increases the probability that you will have drive failures. Clearly, 12 drives will have a failure about 4x more often than just 3 drives (but the will have individual read errors at the same rate, since that rate is per byte, not per byte). At 2-fault tolerance, drive failure no longer dominates data reliability, but every drive failure is a big hassle (you need to temporarily survive with less redundancy, identify and remove the bad drive, add a new one, and do a rebuilt). All these processes involve humans, which are error prone, and are the greatest cause of data loss. So having fewer drives is good, both in saved effort and in reliability.

The other side is the performance argument. Each hard disk is typically capable of 100 MByte/s and 100 random seeks/s (a.k.a. IOps). In a redundant system, all writes have a performance cost (for example 3-way mirroring the writes cost 3x more, and RAID-Z2 over 12 drives the writes cost 20% more), which does not apply to reads. And small updates in place (which ZFS doesn't do right away) have an even higher write cost. Still, more drives in parallel run faster. With 3 drives, your read/write speed will probably top out at 300/100 MB/s; with 12 drives, it's probably 1200/1000 MB/s (assuming the rest of the system is capable of it, which is very hard to predict, and even harder to achieve in the real world).

BUT: Do you have any workload that really needs that speed? More on that below,

My personal experience with SSDs is ~12 disks since around 2006 and few brands, and shows no differences and generally better reliability than HDDs.

Better than HDD, yes. Good, no. I've seen first-hand and heard too many horror stories about SSDs that brick themselves, or lose data. Still, HDDs fail so often that by comparison SSDs look good. At one point, I was working with one installation that was replacing on average 5 spinning disks per week, and each replacement was "a bit" of work.

I absolutely cannot afford enterprise SSDs. That price jump is ridiculous. If it's not viable to use consumer SSDs in a NAS, ...

Well, they are expensive for a reason. They have better write endurance (which means more internal flash, and better-grade flash chips, there is a complicated science to MLC), and much better quality control. And that doesn't just mean an extra hour on the burn-in stand before shipping them, but for example much better auditing of their internal firmware (of which they have zillions of lines). You pay for that.

Personally, if you need SSD speed or want SSD noise/power, I would go with good brand consumer SSDs (Crucial, Intel, Samsung, many others), and keep the redundancy up. Maybe even with a mix of drives from different brands, so common firmware faults are not a single point of failure.

Can a 3-way mirror of 10TB disks over 10Gb ethernet rival the performance (for network boot) of a SATA SSD? If it can, I'll consider it.

The first step in performance is: You need to specify what you need. Not what you want, but what you need to operate. Ideally, you should put a $ figure on how much extra performance is worth to you.

For example: You seem to think that the 10Gb network will be the bottleneck. That means your server needs to deliver roughly 1GByte/second. With spinning disks, that requires 10 disks, and good system tuning. With SSDs, the answer is trickier, I think real-world delivered bandwidth for large reads on SATA SSDs is about 300-500 MByte/s each, so you need 2-3 SSDs to accomplish that. But do you actually have any system that is capable of consuming a GByte/second sustained? Or do you have any tasks where dropping the speed to a mere 100 MByte/s (one tenth of your goal) will be a significant slowdown? Are your clients really all connected via 10gig? Or do you have many clients running in parallel (which opens another whole can of worms)?

All performance engineering has to start with "speeds and feeds": What's the speed of your devices, and how fast can your users feed or consume data. If you really will lose money by having less than 1Gbyte/s, then you need roughly a 10-dozen spinning disks.

There are a lot of people using those norco cases with better fans with good results. I'm open to advice about a redundant power supply ideally under $500.

Redundant fans are good, fans will fail. For example a good-quality push fan, and another pull fan. Then put in some monitoring of fan rotation if possible, and/or some monitoring of disk temperature (for example with smartctl). If the disks get significantly warmer than 40deg C, raise alarms.

Redundant power supplies I think is overkill for amateurs. It really helps if your data center is wired with redundant power distribution and redundant power sources. Then the electrician can work on one breaker box, or the hydro-electric power source can go offline during a drought while the nuclear power plant is present. Most households are not wires that way (data centers often are). But in my personal experience, power supplies themselves are quite reliable, and being able to hot-swap them is rarely useful.

It would be possible to use a 3-disk mirror of 2TB SSDs for the network boot, a 3-disk mirror for server boot, and then use HDDs for other bulk storage.

The ideal solution is to segregate your data. For each file, decide whether it needs to be on fast storage, on reliable storage, and/or on cheap storage. Make the good/fast/cheap tradeoff every single time. Then aggregate your files into "storage classes", and provision different types of storage for each (like non-redundant fast SSD for OS boot and a few high-frequency temp files, redundant high-quality SSD for valuable yet frequently read files, high-quality disk for bandwidth-intensive but not IO-intensive files, and finally redundant but cheap disks with frequent maintenance for archival storage). For amateurs, this is impractical on a file-by-file basis, but you can do it coarsely.

jsmastering · Jan 12, 2019

ondra_knezour said:
You may put not only OS on a such mirror, but also L2ARC and SLOG for faster read and write, which may make spinning disks solution acceptable speed-wise.

That makes sense. In talking to a few other people, it seems like all-flash will still perform better than flash for L2ARC/SLOG. Obviously, it would depend on cache hits/misses and size. I'm not aware of a good way to predict that. Considering that I can afford all-flash (just not all enterprise flash), it seems like it's the safer choice.

ondra_knezour said:
Here I have to admit that I am doing "as always" for years based on some blog articles about ZFS tunning by authors/Solaris developers, which are mostly long gone (as Oracle removed those blogs from internet). It is well possible, that it is not critical/very important any more, but there is some logic in it - basically you still have couple of data parts and parities which you want to spread over multiple devices in somewhat even way.

I've seen it both ways. The way it made sense was that zfs distributes based on blocks after compression, not necessarily discs, for z2/z3. So, the number of discs isn't as much of an issue as with other more strict RAID implementations. I could easily be wrong.

This is one such article. It's focused on database storage, but I'm not sure how lz4 compression with basically anything else would be significantly different. I'm open to arguments as why it would be.

ondra_knezour said:
I am not avare of any and not because I didn't search :/

Same. It's a shame. Backblaze did a lot of good for research buying hard drives. There just doesn't seem to be something similar for SSDs.

ondra_knezour said:
Hard to say. I just googled some tables putting sustained sequential read for SATA drive at about 150-170 MBps. Let say 150 x 8 bits per byte is 1200, so you would need little more than eight of them to fully utilize 10 Gbps interface. I ignore base 1000 vs. 1024 difference etc. here because I would consider other aspects very important. How fast can really ethernet device send data out? Will OS cope with such speed? NFS/CIFS/iSCSI, what is quality of their implementation you have on hand? And what is overhead of those protocols? And so on. So I would try to setup some benchmarks, spinning disks, spinning disk + fast NVMe devices for L2ARC, all-flash and see. I case this is not available I would settle with feeling that HDDs + fast cache should do well.

I have no issues spending 1500USD or so on disks, which covers either my plan or the 3x10TB + a reasonable but not crazy cache.

Why would you default to spinning drives + fast cache as opposed to all flash?

ondra_knezour said:
I don't know what your space requirements and power constraints are, but did you consider something like this? We got one loaded with 4 TB Toshiba disks and all the bells and whistles for about 5-6 000 in USD. It was about two years back?

7-10TB (3TB fast, the rest can be slower) gives room to grow. 5TB would be a minimum unless I get more out of lz4 than my tests indicate and if it's all fast.

The whitebox I listed above is very similar. this is the motherboard I'm looking at. Other than SoC SATA vs. Broadcom SAS, it looks a lot like it's almost a newer version of that motherboard. The xeon d-2100 is apparently a significant improvement, and pricing is similar. It's a different case, and I haven't decided on a PSU yet.

Your server would work...it's about the same price as my whitebox list. Apparently, it's available diskless a couple places. The CPU isn't as nice, but I'm pretty sure that's a very minor trade-off.

How loud is yours? Noise is a bit of a concern...it's in a different room (technically a closet), but I don't want to be able to hear it. And I can sometimes hear my synology through the door. Sealing that closet better would involve installing more active cooling for the room...an option, but I don't want to do it unless it actually becomes necessary. It seems like it would be easier to make a 2u or 4u quiet than a 1u.

ondra_knezour said:
Generally this. In case of failure, your may experience another one before you replace failed device and data are gone.

I'm aware of that issue. It's why I was using raid 6 on my current NAS full of 2TB drives, plus hot spares (and of course backups).

My thought was that I could avoid the issue by going with smaller disks and more parity, even if it's a larger array. Replicating my current array with consumer SSDs is possible, but it's a little over what I'd hoped to spend.

ralphbsz said:
For most commercial users (whose time is not free, and whose data is way more valuable than the hardware they store the data on), the warranty on drives is irrelevant.

Without actual numbers for real-world endurance, what the manufacturer quotes as the TBW warranty figure seems like a decent metric for when to expect to preemptively replace them, i.e., design life for my use. It's been a long time since I bothered with a warranty on anything but a laptop. Without a business case either way, it's just not worth the hassle.

ralphbsz said:
Next topic: write endurance. Please measure your write traffic, and compare it to the published write endurance of the drives you are considering. Most people will NEVER get anywhere close to the write traffic that causes write endurance problems.

I don't believe I will, based on looking at SMART data and vastly overestimating how it'll scale in the coming years. That's part of why I don't think I need higher-endurance SSDs.

ralphbsz said:
It's not about fast, it's about convenient. If you were building a hard-disk based server, then having your OS and boot on SSDs is still a good idea, because your machine comes up much faster. And the U.2 drives are physically tiny, use little power, and not very expensive.

I'm planning on mirroring the OS drive to 2 or 3 SSDs regardless of the rest of the storage, as stated above. I'm questioning whether it's worth using U.2 drives.

Given some of those comments, are you thinking about m.2 drives? U.2 are still 2.5" drives, unless I'm really mislead. Are there cheap U.2 DOMs that I'm not aware of?

ralphbsz said:
Let's start with data reliability....

We're only talking about raid-z3 for the larger array or 3-way mirrors, for everything but the OS (2 or 3-way mirror). I'm asking about raid-z3 with 1TB drives, which seems plenty safe in terms of read errors and okay in terms of disk failures. Some here are suggesting 3-way mirror of 10TB drives, which is wildly different from what I've been lead to consider.

I probably should have stated that there are 3 other copies of the data in a backup schedule already, plus cloud storage for the very important stuff.

ralphbsz said:
Personally, if you need SSD speed or want SSD noise/power, I would go with good brand consumer SSDs (Crucial, Intel, Samsung, many others), and keep the redundancy up. Maybe even with a mix of drives from different brands, so common firmware faults are not a single point of failure.

That's exactly what I'm talking about doing and people ITT are trying to steer me away from.

Mixing brands is a very decent idea. I overlooked that, having only worked with RAID that wanted identical disks in the past.

ralphbsz said:
The first step in performance is: You need to specify what you need. Not what you want, but what you need to operate. Ideally, you should put a $ figure on how much extra performance is worth to you.

The $ figure is the cost of the server. So, around $4000, give or take some details. I'm satisfied with the current performance of a single SSD. I'm not satisfied with having to trust Windows 10. Even good backups take time to restore. Most of the desire is snapshots on the boot volume for that computer. And data rot protection on the archive would make me feel safer than finding silent corruption and hoping it's readable and intact on one of the backups.

I am not willing to significantly lose performance compared to what I have now...hence, the desire for performance booting over the network similar to a single SATA SSD in the box.

ralphbsz said:
For example: You seem to think that the 10Gb network will be the bottleneck. That means your server needs to deliver roughly 1GByte/second. With spinning disks, that requires 10 disks, and good system tuning. With SSDs, the answer is trickier, I think real-world delivered bandwidth for large reads on SATA SSDs is about 300-500 MByte/s each, so you need 2-3 SSDs to accomplish that.

Fair enough. I'm not opposed to doing a hybrid array with tiers of storage. And I'm open to suggestions of how to lay it out.

My guess would be a 3x6-8TB HDD mirror for bulk storage and some number of SSDs for fast storage. But, I'm starting to be confused about what I'd need for that side of it. Assuming you're talking about only data disks above and wanting 3-disk redundancy, that would be 6x1TB Raid-Z3. That's doable.

My eyes are crossing trying to figure out how much better/worse that would be compared to all-flash in terms of resiliency.

ralphbsz said:
But do you actually have any system that is capable of consuming a GByte/second sustained?

Probably not. Half that would be on par with what I'm benchmarking now. And that's fine, of course depending on network/protocol overhead.

ralphbsz said:
Or do you have any tasks where dropping the speed to a mere 100 MByte/s (one tenth of your goal) will be a significant slowdown?

Probably not. I'm not sure how to determine at what point slowing down would be noticeable. It's somewhere between 500MB/s and 150MB/s.

ralphbsz said:
Are your clients really all connected via 10gig?

If I do this build (or anything like it), the one that needs to will be. My FreeBSD and Linux machines are running 1Gb for shared folders just fine, considering how I use them. It's just booting that one Windows 10 machine that I don't want to do over 1Gb. That would be a noticeable slowdown compared to just using a SATA SSD in the box. Backups over 1Gb are fine. Restoring from them is....okay.

ralphbsz said:
Or do you have many clients running in parallel (which opens another whole can of worms)?

Not with any consistency. I was planning on a small 10Gb switch to connect shared folders to other computers (off the second SFP+ port; first one direct to the win10 machine). It's not a priority.

ralphbsz said:
All performance engineering has to start with "speeds and feeds": What's the speed of your devices, and how fast can your users feed or consume data. If you really will lose money by having less than 1Gbyte/s, then you need roughly a 10-dozen spinning disks.

It's just me using it. Fast tier needs to be ~3TB and not noticeably slower than a single direct attached SSD.

ralphbsz said:
Redundant fans are good, fans will fail. For example a good-quality push fan, and another pull fan. Then put in some monitoring of fan rotation if possible, and/or some monitoring of disk temperature (for example with smartctl). If the disks get significantly warmer than 40deg C, raise alarms.

I was planning on monitoring them. I'm not sure how to do redundant fans in such a small case. It seems like all of the 1u/2u cases I've looked at (even much nicer ones) only have midwall fans. I'm not opposed to going with a tower case, which would make that easier. I just don't know where to look for drive enclosures that are actually good.

ralphbsz said:
Redundant power supplies I think is overkill for amateurs. It really helps if your data center is wired with redundant power distribution and redundant power sources. Then the electrician can work on one breaker box, or the hydro-electric power source can go offline during a drought while the nuclear power plant is present. Most households are not wires that way (data centers often are). But in my personal experience, power supplies themselves are quite reliable, and being able to hot-swap them is rarely useful.

That's nice to know. FWIW, I haven't had a PSU fail. It just seemed like a good idea.

ralphbsz said:
The ideal solution is to segregate your data. For each file, decide whether it needs to be on fast storage, on reliable storage, and/or on cheap storage. Make the good/fast/cheap tradeoff every single time. Then aggregate your files into "storage classes", and provision different types of storage for each (like non-redundant fast SSD for OS boot and a few high-frequency temp files, redundant high-quality SSD for valuable yet frequently read files, high-quality disk for bandwidth-intensive but not IO-intensive files, and finally redundant but cheap disks with frequent maintenance for archival storage). For amateurs, this is impractical on a file-by-file basis, but you can do it coarsely.

That isn't significantly different from what I'm doing now, except that I'm not currently using ZFS. And, of course, only doing it coarsely. Archive storage and first line backups are on the NAS. Frequently accessed stuff and boot volumes are on single SSDs. Everything is backed up to external hard drives on a rotation schedule.

If I could run a real OS on my studio machine, I wouldn't be doing this at all...I'd just be using root on ZFS and backing up to a cheap NAS, then externals on a rotation schedule. But, I flat out can't run a real OS for a studio machine, and what I need to do doesn't work well under virtualization the last time I tried it.

Phishfry said:
I have a beef with your reluctance to use enterprise drives.
In my opinion we have free software, cheap used LSI SAS controllers on ebay, So spend your money on quality drives.

I'm not reluctant to use enterprise drives, I'm slihghtly reluctant to use a couple TB worth of enterprise SSDs. Depending on the size of the array, something like the Samsung 860DCT would be doable. I'm just not convinced I actually need it compared to treating drives about half the cost (less for literally everything else out there) as disposable.

Phishfry said:
I found myself in the same spot. Not rich but wanting 24 drives.
So I got real lucky and found 2.5" SAS2 drives but only 450Gb. They were zero hours for only 10 bucks each.
I also bought some used SanDisk/Optimus SAS SSD's. These have 2M hours MTBF. So I feel confident with those.
Checkout Pliant drives. Some use SLC and will last forever. Cheap but only SAS1.

If you are building bulletproof storage you need to think about the medium first.

I don't want 24 drives. I want as few as possible for the amount of storage I need.

I've never heard of Pliant drives. Quick searching shows they're orders of magnitude more than I can afford. SAS1 seems like a downgrade, unless I'm really mistaken.

Phishfry · Jan 13, 2019

Yea you wanted cheap flash storage better than consumer grade and Pliant was the first that popped in my mind.
Pliant is old and was a bad recommendation. But they were some of the first SAS SSD. Bought up by SanDisk.

I opted for spindles because the fans make more noise than any hard drive.
The lack of heat is where SSD rule along with speed and power draw.
There is quite a premium for this though. I plan on using 3 NVMe as cache and SLOG instead.

ondra_knezour · Jan 14, 2019

jsmastering said:
Why would you default to spinning drives + fast cache as opposed to all flash?

Available space was priority regarding server linked above and I believe that given our usage patterns we will not get any advantage with all flash.

jsmastering said:
How loud is yours?

Can't remember exactly, I saw it only in datacenter last two years and it is loud there, but definitely not something I would put i the next room. Basement, why not, but no next room.

jsmastering said:
Given some of those comments, are you thinking about m.2 drives?

In fact in ours we have one M.2 NVMe for OS and L2ARC, but you have only one slot on the board, so no redundancy/mirror unless you add another one to the PCIe slot. No big deal here as I don't use it for write cache because of missing redundancy and I can live with hours offline in case of failure.

LVLouisCyphre · Jan 1, 2020

jsmastering said:
After reading Calomel's Article, it seems like I probably need an HBA/RAID card rather than relying on motherboard SATA ports. I do have an older m1015 sitting around.

That's a common misconception; well commented and documented on the FreeNAS/iXSystems forum. For ZFS to perform optimally needs raw access to the drives which you don't get with a hardware RAID controller. Bottom general rule of thumb, don't use a hardware RAID controller with ZFS. ZFS performs the RAID functionality. This doesn't apply to just ZFS but RAID typically in general; either run a hardware or software RAID.

You want the SATA ports ideally in AHCI mode. I just bought a used Lenovo TS430-0388 specifically as a generic FreeBSD 12.1 server. The factory four port 3.5" hot swap drive cage in this configuration has four motherboard SATA ports connected to into the drive cage SFF-8087 port. The motherboard SATA ports are going to be in AHCI mode for ZFS or for recovery of a RAIDZ6 of another system running FreeNAS. If you're going to use the M1015, it should be flashed in IT mode for ZFS which gives you 8 raw SAS ports. The only advantage I see of using the M1015 is port density and ability to use SAS drives. Everyone is buying up SATA drives most likely for a NAS which has left an abundance of SAS drives due to excess supply and lower demand.

jsmastering said:
So...anyone want to hold my hand through this? It'd be nice if I could email a beer in thanks.

It's legal to ship alcohol to my area.

Lamia · Jan 1, 2020

jsmastering said:
So...anyone want to hold my hand through this? It'd be nice if I could email a beer in thanks

That is how SM works. You will be buying it via her distributor and at a higher price.

You can get that model and much more on eBay if you don't mind. There are several sellers of chassis SM MoBos across the globe on eBay. No luck with AliExpress though.