Deploying Multiple Systems ==> Drives, Filesystems, Imaging, Etc.

Jose · May 17, 2022

Dave-D said:
I'm assuming this is the same basic idea as mirrors on ZFS?

The concepts are the same, but I'm sure the implementations are vastly different.

Clonezilla is superfantastic for migrating Windows drives. I boot from a CD or USB drive, copy the data over to a new drive, and Windows is none the wiser 'cause it wasn't even booted up when I did the clone. I've never felt a need to use it on Unixy machines.

Dave-D · May 17, 2022

Jose said:
Clonezilla is superfantastic for migrating Windows drives. I boot from a CD or USB drive, copy the data over to a new drive, and Windows is none the wiser 'cause it wasn't even booted up when I did the clone. I've never felt a need to use it on Unixy machines.

I'm not tied to it in any way.
Looking forward to learning more about FreeBSD and finding new and better ways to do things.

mer · May 17, 2022

clonezilla does bit for bit copy on the source and dest device? If your source device was say 128MB but the dest was 500MB booting from the dest would look like 128MB?
Of course toss in copying all the "bad" bits too.

Jose · May 17, 2022

It's smarter than that. I clone NTFS filesystems to larger drives all the time, and Clonezilla has always done the Right Thing(tm). Ditto with boot drives. It feels like magic.

mer · May 17, 2022

Jose thanks. I wasn't sure, some cloning tools are less than smart.

Dave-D · May 17, 2022

cy@ said:
I've used clonezilla on Linux at $JOB. But my approach predates clonezilla by about 10-15 years. I conceived it when I switched from MVS (IBM mainframe) to UNIX (Solaris, HP/UX, DG-UX, OSF/1), when patching would keep a server down for 1-3 hours instead of mere minutes. What I did was pretty much what Sun did when they implemented UFS boot environments. I shared the approach with Sun in 1995.

On the mainframe we'd patch the inactive disk, then reboot the inactive disk during a change window. Patching would take many weeks of research, planning, and implementation while the reboot took less than 30 minutes. When I started work on UNIX in 1992 the first thing that came to mind was, how backwards the UNIX patching and system install process was.

Stories like yours amaze me.

It's an incredible thing to have access to such knowledge and experience.
I am very thankful for the experts on this site who are willing to share their expertise and patient enough with greenhorns like myself who are starting from nothing and trying to learn.

ralphbsz · May 18, 2022

Dave-D said:
So lets say we have RAID2z - we can lose 2 disks. But it takes a min. of 4 disks.
...
Unless you go with mirrors. They are much easier (less stressful on the system) to resilver.
With a 4-way mirror, you can lose 3 disks.

True. But 4 disks using RAID-Z2 have the capacity of 2 disks worth. A 4-way mirror has the capacity of 1 disk.

In the tradeoff game (more redundancy <-> more capacity), there is no free lunch. With modern disks, and excluding failure modes that introduce correlated failures (more on that below), the sweet spot today is being able to tolerate two faults. In a well-designed system (which ZFS is), in such a situation you will recover back to a single-faulted system pretty quickly after the spare drive is put into service.

gpw928 said:
Never underestimate the danger of replacing the wrong drive when one (or more) spindles fail in a RAID set.
...
I'm paranoid in dealing with this situation. It's why the IBM procedures for RAID maintenance walk the engineer through a process that eventually lights a bulb on the broken drive. However your drives probably won't have lights, and if you have multiple sites, you may have to rely on hired help.

When I worked at IBM, this was actually measured: the #1 source of data loss was ... field service engineer pulling the wrong disk out. Really, it beat disk failure hands down.

That's why well-designed disk enclosures have a combination of the following: indicator lights that say "you are allowed to remove this disk"; another indicator light that says "please remove this specific disk right now"; battery-backup for those indicator lights so even if the field engineer cuts power, they remain on for about an hour; solenoids that lock the disks in place so the good disks can not be removed without cutting power to the system; and finally a loud alarm beeper that sounds if you use the remove handle on a disk that is not supposed to be removed. This is how you build systems that get good reliability in the real world.

On the matter of using a UFS root, I too did that for ages because at the beginning ZFS had no boot option. Back then I actually had space allocated on the root mirror for two completely separate bootable root file systems. And I used them for upgrades because I had to have a fallback if something went wrong.

Again, having two independent copies of the boot environment is vital for real-world reliability. If your computer costs many M$, and there is risk that an upgrade might break something, you keep the current configuration on one disk, and only upgrade the second copy.

Dave-D said:
Would it be impossible to "get into trouble" by pulling the wrong drive while using mirrors?

Absolutely.

If I did pull the wrong drive - any single drive would have a complete set of data on it and doesn't "need" any other drive to "reconstruct" the data like RAIDz would?
As long as I had 1 complete and working drive from the mirror, could I not reconstruct the whole mirror from that one drive?

Not if there are writes occurring while you are pulling drives. You can easily end up with a situation where every bit of data is at least on one drive, but no single drive has a completely copy of all data.

Another comment: This whole discussion ignores the "dead moose on the table". We're all talking about data loss that's caused by disk failures (or cable or connector failures), and how to address that using RAID. It ignores that most data loss is caused by users. The joke example is "rm -Rf /", but much more common is "I overwrote that file". Really good backups are in reality more important than RAID.

Dave-D · May 18, 2022

ralphbsz said:
Not if there are writes occurring while you are pulling drives. You can easily end up with a situation where every bit of data is at least on one drive, but no single drive has a completely copy of all data.

So here's where I stand:

I've decided to get a LSI card so I can run 8 drives, the limit of my hot-swap cage.
That gives me a little more to work with.
Not sure what the lights indicate on the hot-swap cages, may just be on/off for disk activity, not yet sure.
This is a small-budget operation. Not a million$+ situation.

If I have to pull a drive, this is a small business, so I would *always* shut the server down before pulling any drive.

So, my big question now, is: Mirrors, or RAIDz, or a combination of both?
I'm thinking mirrors sound a lot easier for a greenhorn and safer to work with, also more flexible so I can adapt to future needs, as I learn.
I'm thinking RAIDz scares me on some level. Mostly because it has to "reassemble" the data to get a complete set. Pulling wrong drive,
being locked into your RAIDz setup & not able to change it without starting over, etc.

Questions... Questions...
.
.

ralphbsz · May 18, 2022

Dave-D said:
I've decided to get a LSI card so I can run 8 drives, the limit of my hot-swap cage.

I like the 8-drive hot swap case. That will make working with the system easier. For example, if you have a disk that gets "sick" (not dead, but unwell and needing to be replaced soon), then you can get a spare disk, add it to the ZFS pool, mark the sick disk as needing to be drained of data (re-replicated onto the spare disk), and when the drain is finished, remove the sick one and throw it away. Having more physical slots than disk allow you to add spare disks temporarily, without doing perverse things with extension cables and disks mounted hap-hazardly.

On the other side: While I love the LSI cards, I wonder whether getting that is a little bit overkill. You already have 6 perfectly good SATA ports on the motherboard. You are probably not planning to run SAS (SCSI) disks anyway, because they are harder to find and sometimes more expensive. It might be cheaper to just get a 2-port SATA card instead, that gives you 8 ports for your hot swap cage, and may be cheaper and easier.

Not sure what the lights indicate on the hot-swap cages, may just be on/off for disk activity, not yet sure.

Typically disk enclosures (including hot-swap cages) have one indicator light, often green, that shows power to the disk being on, and/or disk activity. In some cases, those are separate (one power light, which indicates that a disk is present and is getting power, and one activity light). That is the "blinking" light. Those one or two lights are controlled purely by the enclosure and the disk drive itself.

All other indicator lights (and crazy features like disk locking solenoids, beepers) are controlled by the computer, usually using an interface called SES (SCSI Enclosure Services). This is not a science, but somewhere between voodoo and magic. There are two problems you need to solve here: (a) If a disk is called /dev/ada5 or has serial number Hitachi 12345 or has WWN 5000cca228xxx, what physical slot of the enclosure is it in? (b) How do I turn the red/yellow/blue/... light for that slot on and off? Doing this reliably is not easy.

If I have to pull a drive, this is a small business, so I would *always* shut the server down before pulling any drive.

Good. That automatically removes many possible failure modes. Here is one piece of advice: Use good human-readable labels on your disk partitions (using the gpart command). For example, my little server at home has two internal spinning disks, and if you do "gpart show -l /dev/adax", the partition name is is: "hd14_home". That means it is the disk named hd14 (meaning it is a Hitachi and I bought it in 2014), and on that physical disk it is the home partition. If you open the server, you will see two physical disks, and one has a big paper label attached, which says HD14 (the other one is HD16). That means that if I need to remove or replace HD14, it's pretty obvious which disk this is.

So, my big question now, is: Mirrors, or RAIDz, or a combination of both?
I'm thinking mirrors sound a lot easier for a greenhorn and safer to work with, also more flexible so I can adapt to future needs, as I learn.
I'm thinking RAIDz scares me on some level. Mostly because it has to "reassemble" the data to get a complete set. Pulling wrong drive,
being locked into your RAIDz setup & not able to change it without starting over, etc.

From the sys admin point of view, RAID-Zx and mirrors work nearly the same. The "reassemble" problem you refer to is all internal to the ZFS implementation. If you pull the wrong drive, you're screwed in either case. Really the only two advantages of mirroring are: You get more redundancy and therefore reliability (a 4-way mirror can lose 3 drives, while a 4-disk RAID-Z2 can only lose 2), and better read/write performance (which you may not care about). The cost of 4-way mirroring is a loss of capacity, in this case by a factor of two (which you also not care about).

Questions... Questions...

The question of how to use RAID efficiently is a super complex question.

Dave-D · May 18, 2022

ralphbsz said:
On the other side: While I love the LSI cards, I wonder whether getting that is a little bit overkill. You already have 6 perfectly good SATA ports on the motherboard. You are probably not planning to run SAS (SCSI) disks anyway, because they are harder to find and sometimes more expensive. It might be cheaper to just get a 2-port SATA card instead, that gives you 8 ports for your hot swap cage, and may be cheaper and easier.

I can get an LSI 9210-8i *new* on ebay for between $49.88 and $59.99, with the (2) SAS cables which look like 4 SATA connectors at the other end of each cable = 8 SATA connectors.
Can also set the hardware RAID to "passthrough" so it won't interfere with whatever RAIDz setup I'm running.
Might be hard to pass up for $50.00.
Specs said you can get an "expander" for that card so it supports up to 256 drives (?)
If I get really crazy my case will take 1 more hot-swap cage for a total of 12 hot-swap drives. Worst-case I can just add another $50 LSI card to handle the additional 4 drives, or run them off my motherboard.

Example:

LSI 9210-8i 6Gbps SAS HBA FW:P20 9211-8i IT Mode ZFS FreeNAS unRAID 2* SFF SATA | eBay

1 LSI 9210-8i IT P20. 2 SFF SATA 8087. ETA 3 to 8 busines.

www.ebay.com

I ordered a couple 4-port SATA cards to play around with, *new* on ebay for about $20.00 ea.

All my current 2TB drives are new Western Digital Enterprise class.
Do you know anything about the "WL" brand of hard drives?
They're said to be made as white label drives then rebranded and sold by many OEM's, etc.
Supposed to be really good enterprise-class drives. Haven't done any research yet.

Example:

WL 2TB 64MB Cache 7200RPM Enterprise SATA 6Gb/s 3.5" Hard Drive - FREE SHIPPING | eBay

[ G01-0352 ] White Label 2 Terabyte (2TB) 64MB Cache 7200RPM Enterprice Grade SATA 3.0Gb/s Hard Drive. White Label 2 TB SATA/300 Enterprise Grade hard drive featuring with 7200RPM, 64MB cache SATA/300 hard drive with 1 year warranty.

www.ebay.com

Notice the seller specializes in selling drives, has sold 6,129 of this one drive model, overall feeback is still 100%.
Assuming all the numbers are legit, if the drives weren't any good then someone would be squawking about it.
in fact someone would be really ticked off and likely throwing a real hissy-fit.

I've also heard that its a good idea to mix in different hard drive brands.
And someone said (in this thread) they mix old and new for reliability.
Of course with my small servers, I'm thinking maybe all new, with two different brands.

Thoughts?

Dave-D · May 18, 2022

ralphbsz said:
For example, if you have a disk that gets "sick" (not dead, but unwell and needing to be replaced soon), then you can get a spare disk, add it to the ZFS pool, mark the sick disk as needing to be drained of data (re-replicated onto the spare disk), and when the drain is finished, remove the sick one and throw it away. Having more physical slots than disk allow you to add spare disks temporarily, without doing perverse things with extension cables and disks mounted hap-hazardly.

Could I do this using an external hard drive in an external drive enclosure attached to a USB port?

This one sounds really interesting. Do you have the steps listed somewhere that you're willing to share, so I could experiment?
.

gpw928 · May 18, 2022

Dave-D said:
I can get an LSI 9210-8i *new* on ebay for between $49.88 and $59.99, with the (2) SAS cables which look like 4 SATA connectors at the other end of each cable = 8 SATA connectors.

Verify the source. Is it coming from China? There's a lot of counterfeits (some of which actually work OK, but YMMV).

gpw928 · May 18, 2022

ralphbsz said:
From the sys admin point of view, RAID-Zx and mirrors work nearly the same.

Agreed, but... RAID-Z is always slower. And striped mirrors are always much faster...

Your point about spare hot-swap slots is a really good one. It really does help with risk mitigation. My internal "cold swap" stack is augmented by a small (3-spindle) (normally empty) hot swap cage that facilitates rotation of 12TB off-site backup disks, and RAID re-silvering prior to any shutdown to remove "problem" drives (so the RAID set can then survive removal of the wrong drive). [The ideas behind this approach were seeded by discussions on this list.]

Phishfry · May 18, 2022

Dave-D said:
Supposed to be really good enterprise-class drives.

No. If they were they would have a 5 year warranty. They are white label.
Maybe last a year, maybe longer.
No one in an enterprise would risk their job on cheap drives. This is just marketing terminology.

Dave-D said:
I'm thinking maybe all new, with two different brands.

Not needed. Just buy quality drives.
Look at backblaze stats.

Dave-D · May 19, 2022

Phishfry said:
Not needed. Just buy quality drives.
Look at backblaze stats.

THANK YOU. Huge help!
"You can't argue with success."
.

Phishfry · May 19, 2022

Success ain't cheap. I understand the temptation of 40 dollar drives.

Dave-D · May 19, 2022

gpw928 said:
Your point about spare hot-swap slots is a really good one. It really does help with risk mitigation. My internal "cold swap" stack is augmented by a small (3-spindle) (normally empty) hot swap cage that facilitates rotation of 12TB off-site backup disks, and RAID re-silvering prior to any shutdown to remove "problem" drives (so the RAID set can then survive removal of the wrong drive). [The ideas behind this approach were seeded by discussions on this list.]

How many Hot-swap slots would I need (of the 8 available) to pull this off, *if* its even possible with my small rig?
Could I use an external hard drive or two, in an external drive enclosure, attached w/usb cable?

Would the "EMPTY* slots below be enough to pull it off?

I'm starting to think along these lines for my final setup:
Rack server case has maximum of 3 hot swap cages @ 4 drive bays/cage = 12 hot swap drive bays:

CURRENT SETUP
Slot 1 - WD 4TB - mirror #1 - OS + LOCAL_BACKUP
Slot 2 - WD 4TB - mirror #1 - OS + LOCAL_BACKUP
Slot 3 - WD 4TB - mirror #1 - OS + LOCAL_BACKUP
Slot 4 - *EMPTY*

Slot 5 - WD 4TB - mirror #2 - DATA
Slot 6 - WD 4TB - mirror #2 - DATA
Slot 7 - WD 4TB - mirror #2 - DATA
Slot 8 - *EMPTY*

FUTURE EXPANSION:
Slot 9 - WD 4TB - mirror #3 - DATA --> [Pool with mirror #2]
Slot 10 - WD 4TB - mirror #3 - DATA --> [Pool with mirror #2]
Slot 11- WD 4TB - mirror #3 - DATA --> [Pool with mirror #2]
Slot 12 - *EMPTY*

Dave-D · May 19, 2022

Phishfry said:
Success ain't cheap. I understand the temptation of 40 dollar drives.

Especially with the current bombardment of price increases.

ralphbsz · May 20, 2022

Dave-D said:
Specs said you can get an "expander" for that card so it supports up to 256 drives (?)

SAS expanders are not typically something that one usually buys separately, they are typically built into larger disk cages or enclosures. In some cases, large disk enclosures will have multiple levels of expanders (if you want to pack 100 disks into an enclosure, a single expander chip won't do).

Do you know anything about the "WL" brand of hard drives?
They're said to be made as white label drives then rebranded and sold by many OEM's, etc.

There are only 2.5 manufacturers of disk drives: Seagate (which includes Samsung), Western Digital (sometimes known as WD, and some disks still known as Hitachi), and Toshiba (which I count as 0.5 because they are small). So where do off-brand drives come from? Typically rejected drives from one of the big vendors. They could take two possible paths: either the manufacturer sold them to a big user (90% of all enterprise disks are sold to a dozen big companies, like Amazon/Apple/Baidu/Facebook/Google/Microsoft/Tencent or Dell/HP/Amazon/Oracle), failed QA testing there or showed errors early on, and were sold to unscrupulous resellers that hide their true history. Or they are drives that failed QA testing at the manufacturer, but I don't think the big manufacturers would be willing to sell those.

In all this, one has to remember that the disk manufacturers have very strict QA systems. And they grade the quality of disk drives: the best ones go to preferred customers, who also get access to internal QA information on a per-drive basis and use the drives in-house, and who pay a premium (typically the cloud superscalers); the decent ones go to price-conscious customers (such as Dell) who resell the drives as part of systems, and the not-so-good ones go into the retail channel. The joke used to be that the worst drives are sold at Fry's (a chain of electronics supermarkets on the west coast of the US, in particular in Silicon Valley, infamous for selling junk and having impossible to navigate return systems), but that's no longer true since Fry's has gone out of business.

If I had nothing useful to do, I would waste $50 on one of those drives, find out exactly what kind it really is (the firmware will give it away), and then perhaps bring it to some friend who works for one of the drive makers and we take it apart together. But I have too many useful things to do.

Summary: STAY AWAY.

I've also heard that its a good idea to mix in different hard drive brands.

Opinions on that differ. At the consumer level, where you have no information about the reliability of individual disks, it might be a good idea, just to guard against the unfortunate coincidence that you buy all drives from one manufacturer/model made roughly at the same time and place, and that kind happens to be unreliable. But modern enterprise-grade drives are so good, with a decent RAID layer on top, they will be reliable enough. At the large-scale user level (the customers who buy a million disks at a time), there are large groups of people who study, measure and forecast drive reliability, and who consciously adjust data placement to maximize reliability and minimize cost. This is one of the reasons that small users simply can't compete with cloud providers: You can't afford the group of 5 PhDs and 10 software engineers that perform such optimizations, but you can rent disk space in the cloud from companies that do.

(about draining a disk that is about to remove)

Dave-D said:
This one sounds really interesting. Do you have the steps listed somewhere that you're willing to share, so I could experiment?

Look at the "zpool remove"command.

Phishfry said:
Not needed. Just buy quality drives.
Look at backblaze stats.

THIS. For the consumer who buys small quantities of drives from the retail environment, looking at Backblaze is the best idea, because that's where their disks also come from. The problem with this approach is: by the time Backblaze has good high-statistics data (like having used 10,000 disks for 4 years), the disk model is probably obsolete, and can no longer be found in the retail channel, except for used or rejected drives. So the idea here is to look for patterns, like all the disks from manufacturer "Elephant" with model names that start with "Dumbo" are very good, and then follow that pattern. I'll give you a hint: My spinning drives are (WD) Hitachi HGST enterprise-grade drives, with model numbers starting with H.

Dave-D said:
How many Hot-swap slots would I need (of the 8 available) to pull this off, *if* its even possible with my small rig?
Could I use an external hard drive or two, in an external drive enclosure, attached w/usb cable?

One spare (empty) slot is enough. Sure, you could do it with external enclosures, but that's a hassle: You have to put the new/spare/old disk in the external enclosure, USB is probably slower, then remove it again and put it into its real location. Easier to have a spare slot.

I'm starting to think along these lines for my final setup:
Rack server case has maximum of 3 hot swap cages @ 4 drive bays/cage = 12 hot swap drive bays:

Given that you are planning to use 6 drives (and your assignment looks reasonable), I think two bays, meaning two spare slots, seems adequate. And cheaper.

Dave-D · May 21, 2022

ralphbsz said:
There are only 2.5 manufacturers of disk drives: Seagate (which includes Samsung), Western Digital (sometimes known as WD, and some disks still known as Hitachi), and Toshiba (which I count as 0.5 because they are small). So where do off-brand drives come from? Typically rejected drives from one of the big vendors. They could take two possible paths: either the manufacturer sold them to a big user (90% of all enterprise disks are sold to a dozen big companies, like Amazon/Apple/Baidu/Facebook/Google/Microsoft/Tencent or Dell/HP/Amazon/Oracle), failed QA testing there or showed errors early on, and were sold to unscrupulous resellers that hide their true history. Or they are drives that failed QA testing at the manufacturer, but I don't think the big manufacturers would be willing to sell those.

In all this, one has to remember that the disk manufacturers have very strict QA systems. And they grade the quality of disk drives: the best ones go to preferred customers, who also get access to internal QA information on a per-drive basis and use the drives in-house, and who pay a premium (typically the cloud superscalers); the decent ones go to price-conscious customers (such as Dell) who resell the drives as part of systems, and the not-so-good ones go into the retail channel. The joke used to be that the worst drives are sold at Fry's (a chain of electronics supermarkets on the west coast of the US, in particular in Silicon Valley, infamous for selling junk and having impossible to navigate return systems), but that's no longer true since Fry's has gone out of business.

If I had nothing useful to do, I would waste $50 on one of those drives, find out exactly what kind it really is (the firmware will give it away), and then perhaps bring it to some friend who works for one of the drive makers and we take it apart together. But I have too many useful things to do.

Summary: STAY AWAY.

Fascinating. Very good advice.

Norm · May 28, 2022

Dave-D said:
NVME is not an option at this point.

Do you mean nvme devices are not supported on freebsd? I use NVME cards in my TrueNAS devices and that's freebsd.
Maybe I'm not understanding something because I'm trying to enable a PCI adapter with an NVME SSD installed to use as a log file but cannot find anything on how to enable this device.

PrometheousJames · May 29, 2022

Is virtualization out of the question? this makes backups/snapshots and such much easier to automate. Also to script out.

Dave-D · May 31, 2022

Norm said:
Do you mean nvme devices are not supported on freebsd?

Sorry, meant this only as a person preference, for various reasons.
No reference to nvme's relationship to FreeBSD intended.
Sorry for the confusion.

Dave-D · May 31, 2022

PrometheousJames said:
Is virtualization out of the question? this makes backups/snapshots and such much easier to automate. Also to script out.

Maybe in the future.
But for now, I don't know much about it, and have enough on my plate.
Would have to set up an alternate test environment and play around with it.
Maybe some day...

Dave-D · Jun 1, 2022

PROBLEM:

GPT partition tables and gmirror both write metadata at the end of a hard drive, which can cause problems with corrupted data.
The only recommended solution is to use MBR partitions, at least for now.

Does ZFS somehow get around this issue?

Does RaidZ have this same problem?
.
.