Soliciting Opinions/Assistance on a Home-Use ZFS Setup

First, I want to give props to this thread which really got me rolling in a better direction and onto this site.
Second, I don't have great UNIX creds, but I have done a little UNIX admin over 20 years (not recently), and cracked a WD NAS last year to work with non-WD drives (the main reason I'm doing a DIY NAS). I also "grew up" with the pre-Windows command line-based system co-developed by a gentleman names William Gates, so command lines don't scare me at all.

I've been planning and searching for quite a while (since at least July 2011), trying to do this all on my own between many other things, but I'm stuck and not finding a close enough match for my situation, so I am hoping some of you kind people can help me with your thoughts. There's a lot I could start with; I'll try to compress (and expand later, as needed).

CURRENT USE CASE & SETUP:
  • 1 WinXP 100Mbps desktop is master for finances, pictures, and music (iTunes)
  • 1 WinXP gigabit desktop is master for home HD videos
  • 1 WD NAS contains backups of all data above (mirrored [so all content exists on 3 HDDs min], 2TB usable, almost full now)
  • 1 Win7HP wireless-N (300Mbps) HTPC (on the only HD TV) is currently sole location for all owned disc rips and recorded TV (esp. set-to-box to Hauppauge device); "ripped" 1990s SD videos live here, but are backed-up to the mirrored WD NAS
  • file cabinets of paperwork to be mass-scanned are waiting for disk space
  • piles of owned DVDs and BluRays to be mass-ripped for HTPC service are waiting for disk space
FUTURE POSSIBLE CHANGES/USE CASES:
  • place HTPC on wired gigabit network
  • recorded TV files processed to cut commercials (on the HTPC or ZFS box?)
  • viewing of recorded content/rips on SD TV, 1 level up
  • additional HD TV, 2 levels up
  • recorded TV files served to other HD TV in near-realtime or after commercials cut
DATA CLASSIFICATIONS:
A) irreplaceables: finances, pictures, SD+HD home videos, paperwork (after scanning/shredding) [this includes almost all available images of our toddler since birth, and most images of my now-teenagers]
B) difficult to replace ("d2r"): recorded TV I use for reference, cannot buy on disc yet, and/or am at the mercy of providers to play again or make available in a non-downloadable commercial format
C) ripped content: self-explanatory; can rip again, if necessary, but it would be a big, time-consuming pain

GOALS:
  • irreplaceables MUST NOT be lost under any circumstances
  • irreplaceables' media must hold near-future data and be expandable beyond that [we added ~1TB across 1 year with the toddler]
  • d2r content SHOULD not be lost, but it won't be the end of the world if it does; one step less-protected than irreplaceables is acceptable
  • d2r media must hold near-future data and be expandable [d2r content is expected to grow also, but rate is unknown]
  • ripped content media must be expandable [is estimated at 4TB and grows at a slow rate, mostly around the holidays :) ]
  • run AnyDVD HD with a BluRay drive (ASUS BC-12B1ST; both newly acquired) on the ZFS host for ripping
  • keep the cost as low as possible while still meeting these other goals
  • no ZFS dedupe, but expect to use copies=2 or 3 and/or raidz in places (see below)
Like most, I initially thought this would be a hardware RAID system, maybe 2x RAID1s for data classes A & B above, especially for the sake of recovering a whole disk, and a RAID5 for class C. That is, until I discovered ZFS and that it can do just such a configuration, only better. I have a few HDDs already, but not enough: the pair of 2TB Samsung HD204UIs in the WD NAS (available after I move that content), and a pair of 1TB WD10EADS (one is the HTPC data drive for now, almost full).

Previously, I had though I'd need a 3TB RAID1 (to start), maybe add a second one later, and use the 2 Samsungs to start a 3-drive RAID5 and grow it later (I know you can't do that, exactly, in ZFS). I had also thought this might be 2 DIY enclosures (the hardware mirrors + the hardware RAID5) so they could move around, if needed, and have duplicate hardware for troubleshooting, if needed ... but that seems to be off the table with ZFS. I'm still entertaining hot-swapping hardware mounted into the case for ease of use, if I can swallow the price at the end (my budget is variable at this point).


Here is where my choices and dilema begin; there are too many variables that I'm not 100% familiar with. My basic question is on direction ... is it better to:

1) Upgrade my XP desktop:
This is the one holding the home HD movies and it does some movie editing. Quick Specs:
  • MOBO: Gigabyte GA-P55-UD4P
  • CPU: Core i5-750
  • RAM: 4GB DDR3/1333 (PC3 10600)
  • VID: Radeon HD 6850 1GB 256-bit GDDR5 PCI Express 2.1 x16
  • HDDs: 640GB for OS/programs/docs/etc., 1TB for HD movies, 640GB for dual-boot Ubuntu Linux
  • PS: CORSAIR CMPSU-650TX 650W
  • CASE: Gigabyte Triton 180 (5x 5.25" bays, 2x 3.5" external bays [taken], 3 internal 3.5" bays).
  • OPTICAL: Sony AD-7240S-0B 24X DVD/CD RW SATA
I have yet to use the 2nd NIC on that board, and it seems like plenty of board to run ZFS in terms of RAM upgradability (16GB max), 8x SATA ports, PCIe slots for HBAs, dual NICs. My impression is also that it's common(?) to run FreeBSD in a VM on Windows. So maybe all I need is a RAM upgrade to 8GB, Windows 7, the needed HDDs + hardware, install the BR drive, FreeBSD and other free bits (like a VM), and maybe then I'm good to go? Or is that short-sighted? Might it just make the box harder to also use as a PC somehow?


2) Donate my mobo to a new build, get a new mobo for myself:
Here, things get very complicated. I can forestall the Win7 desktop upgrade and run a standalone ZFS box. I could get a lesser (cheaper) board for myself that holds all the same other hardware (factor that into the costs). Or I could move the CPU with it (seems overkill for ZFS) and upgrade the key parts of my PC to improve the video editing (but to what mobo & CPU?). Then ZFS needs a video card, too. I've been eyeing the following to fill out the ZFS box from there:

  • OS: Run WHS 2011 or Win7 for AnyDVD+BluRay to rip directly to the volumes? Or straight-up FreeBSD?
  • CPU: if I don't move the i5-750, I'll need a "cheap" LGA1156 chip
  • VID: unknown (as cheap as possible)
  • HDD for OS: either an old Seagate ST3120814A 120GB IDE (should still work) OR a USB 2.0 flash drive, maybe mounted to an internal header with an adapter
  • RAM: G.SKILL Low Voltage 8GB (2x4GB) DDR3/1600 (PC3 12800)
  • PS: Thermaltake TR2 W0070RUC 430W
  • CASE: unknown (help?)
  • BACKPLANE (optional): previously look at the SUPERMICRO CSE-M35T-1B 5x3.5" Hot-swap SATA
Or is any savings not worth the trouble of moving all that stuff around? I also thought about using my wife's LGA775 board to give her a possibly more-needed upgrade, but that board has a 4GB RAM cap and few PCIe.

BTW: I've never seen anyone write about building an external HDD box for ZFS over eSATA. Is it possible? And with what hardware? (I don't trust USB 3.0 yet.)


3) Build new ZFS box from zero:
I've been eyeing the following hardware, to add to those in #2 above ...

  • MOBO: GIGABYTE GA-Z68A-D3H-B3 or GA-Z68XP-UD3
  • CPU: Intel Pentium G620
This would be a single-NIC box on a much lower, but apparently adequate, CPU, and no IDE means the Seagate OS disk is out of the picture. Everything else is the same as #2, I think.


So, what would you do in this situation? Is there a better/best way? Should I consider different hardware than I chose? What hardware could fill in the blanks I still have? Do you have favorite places to procure these items in the U.S.? Any other general thoughts or pointers here?


To summarize, in the end, all I'm looking for is:
  • unfailing storage of irreplaceable data; speed is not a concern
  • low possibility of failure for difficult to replace data; since this is remotely-served video content (basically being a DVR HDD) write speed could be an issue upon recording, and read speed is a bit of an issue for playback ... of course, the HTPC could record the whole files and just send them the ZFS box (but see my future use cases for near-realtime playback)
  • moderately falable storage for ripped content; unrestrained playback is the speed goal there
  • the best places/ways to transcode and serve out all this data for my needs

Please note that, otherwise, speed really does not factor in here much for me. Many people use RAID for speed; I am not one of them. It basically only needs to be fast enough to play back BluRay content, and maybe record 1080i content without interruption. Regarding multiple users, there may be times when a backup is running while we want to want a movie, and perhaps later running content to 2 TVs simultaneously.
 
Build a completely separate FreeBSD box. Don't try to run it in a VM, especially if it's your storage box. Do you really want FreeBSD to be at the mercy of Windows drivers, and Windows stability? It's a storage box, it needs to be separate.

Don't bother with your "irreplaceable", "difficult to replace", etc breakdowns. Just put it all on one FreeBSD ZFS box. Make separate ZFS filesystems for each category, if you must, and have separate snapshot schedules for them. But put them all on one box. Centralise your storage.

And don't use it as a "backups" box. Use it as a storage box. Put all your files on it. Share them out via Samba to the Windows machines. And use the harddrives in the Windows machine as scratch space, not as primary storage.

Don't store anything on your HTPC boxes, unless you are recording direct-to-disk (that you might have issues with over the network, so do the recording to disk, then move that to the storage box).

Once you have the storage box up and running, put all your files on it, created a bunch of snapshots, then look into using your existing NAS as a backups box for the storage box.

Keep things simple, and centralised, and don't try to overthink things. :) This is a home setup, not a fancy Fortune-500 enterprise.

Find the biggest case you can afford, with the most harddrive bays (12 minimum), and find someplace out-of-the-way to put it (with good ventilation) so you don't have to listen to it.

If you want raw speed and the most versatility, use mirror vdevs. Start with 1 mirror, add mirrors as you need space or more speed (pool acts like a giant RAID10 array).

If you need raw storage space and aren't concerned with speed, then look into raidz2 or raidz3.

If you really must, you can create two separate pools, one using mirror vdevs, one using raidz2 vdevs and split your data across the two pools based on your needs.
 
backups

I appreciate that. Probably the kind of slap on the head I needed. Yeah, overthinking is what happens when you spend 8 mo off and on researching tech you've never used while only talking to your non-techie wife about it (who is kind enough to at least feign a mild interest). It also happens while trying not to make any costly mistakes, like the All-in-One that doesn't scan legal size I just bought because I forgot that secondary "requirement". Ah, well; didn't spend nearly as much time thinking about that one.

If I do what you suggest, and I am certainly considering that, then my problem would be the backups.

The WD NAS is probably off the table because it's the genesis of this DIY for me. It came with 2x 1TB drives. I spent 3 weeks through Christmas vacation banging my head against that thing, doing little else but trying to make it take two non-WD 2TB drives without losing my data. Though ultimately successful, it was the worst tech experience I ever had (and I'm in the tech business, though another branch), and made me hate the words "Western Digital" and "proprietary." With all that work to make it take just 2TB (or even 4TB non-mirrored), I don't see how it could be a backup all I want to store.

So if I need a backup for my ZFS, that sounds like even more hardware I'd need to invest in. Which leads me to this question:

I've been reading for quite a while the mantra that "RAID is not backup", but I've long wondered what people use said large backups nowadays? I used to have a tape drive in my desktop way back in the day, but is anyone using tape anymore? People don't backup to stacks of BluRay discs, do they? If RAID is no one's backup, what is? What's typical now?
 
RAID is not a backup because it does not protect against the following situations:

  1. System failure. For example, one time my power supply in a computer blew a fuse and lit the box on fire. Freak incident, but RAID did not back up against the case of flaming hard drives. Another example is when I lost a box due to flooding (pipe in my apartment's ceiling burst and flooded the apartment).
  2. Data Corruption. Think a computer virus corrupting your data. In a RAID setup, the virus has corrupted multiple copies of the same file.
  3. User Error. How many times have you accidentally deleted a critical file and needed to restore it? Mirrored, striped, and RAID 3/5/6 do not protect against accidental deletion. All copies are modified simultaneously.

RAID is a solution to one particular problem: failure of a hard-drive. RAID, except for RAID-0, will allow your system to run with no data loss if a hard drive fails. Nothing more, nothing less. In order to have a true backup, you need to have your data stored on a physically separate medium (e.g. external hard disk, pile of dvd's, backup server, etc.). Preferably, that physically separate medium should be at another location. In my case, I have one of those portable external hard drives that I keep in my desk at my office. Every few weeks, I bring it home, update the backup, and then bring it back to my office where it sits in my desk. It's not an enterprise level solution, but works well for the home user.
 
large backup hardware/media

Thanks for those thoughts. Maybe I need to clarify a few things, though.

I totally understand that RAID is not backup, if that's your only storage. But by my (limited) logic, if I've got three main, standalone, Windows PCs with different important data on each of their HDDs, then I at least need a 2nd copy as a backup, preferably in a separate box, as a first step. That could be 3x external drives or, even better/easier/cheaper(?), a single-drive NAS to share. But then why not fortify that and get a RAID1 NAS, so I not only have 2 copies, but essentially 3 copies of all important data. In fact, that is what I've been doing for the last 2-3 years after "flying without a safety net" and being lucky for too long. So, I've already got RAID as the backup, to increase my chances of successfully restoring from it if I lose a drive anywhere.

I also understand that solution has some inherent problems, e.g., what if my backup to the RAID1 NAS corrupts a file in the write process and it writes the wrong thing to the 2 backup drives, the usual hardware RAID risks, plus I can't expand it easily, etc. That's why I'm going for ZFS (and a few more HDDs), which I see as protection from data failure and HDD failure, more flexible, and perhaps get more storage for my money in some places (via raidz).

I totally agree that fire/flood needs an off-site solution, or at least a fire/flood-proof safe. That's another step I'm working toward with my backup question. I understand that ZFS should be able to deal with data corruption and accidental deletion via snapshots.

Your comments about your mobile external hard drive get to the heart of what I was looking for, and I could do that, but then how big is it? I'm looking for a backup solution for what may be 5TB+ of data very soon. I didn't think externals that big were available yet, and if so they must be expensive.

I figure there are others in my position, so what do folks do re: hardware & media to backup several TBs off their RAID/raidz#s today?
 
As the post above, but one would be advised that the offsite backup can (in my case, did) also experience sudden hard drive failure during the restore. I'd advise multiple backup disks...
 
WiiGame said:
So if I need a backup for my ZFS, that sounds like even more hardware I'd need to invest in. Which leads me to this question:

I've been reading for quite a while the mantra that "RAID is not backup", but I've long wondered what people use said large backups nowadays? I used to have a tape drive in my desktop way back in the day, but is anyone using tape anymore? People don't backup to stacks of BluRay discs, do they? If RAID is no one's backup, what is? What's typical now?

We replicate (rsync + snapshot) the data from our main storage box to an identical off-site storage box.

It all depends on the size of your data store. If it's only a TB or so, you can pick up an NAS box and use that as the backup storage. If it's more than 10 TB or so, you'll probably have to create a second ZFS box and use "send/recv" or rsync to replicate the data to it. Ideally, you'd want to put the backup box at another location to guard against the house burning down. :)

When you get into the multi-TB storage arena, tape, Blu-Ray, DVD, etc just don't cut it as a backups/archiving solution. Dump to disk is the only way to go.
 
Backups:

For work: tape auto-loader, tapes shipped to off-site location. We may need to retrieve data in a state it was in 7 or more years from now for legal reasons, having data replicate to a remote location is useless unless we have enough disk to keep snapshots for 7+ years, and we currently don't. Some might, however...

For home:

I don't *need* all x tb of media I have on my storage. I just back up the subset of data that is mission critical (to an external drive and/or other machines), and the rest, well it generally breaks down as:

- software: re-download from internet
- non-self created movies/other media: re-download from internet or re-rip from DVD

Losing a few tb of home media would suck yes, but if it is purchased, it can be re-downloaded. If it is pirated, it can be re-downloaded. If it is ripped, it can be re-ripped (and if I lose my original DVDs in a fire, etc home contents insurance pays out).

Essentially, I only back up stuff that I can't easily re-retrieve or re-create. I suspect for most people, that isn't actually a hell of a lot, home-user wise.
 
Back
Top