Thanks for prompting this discussion, Andy.
AndyUKG said:
Hmm, well you are definitely making a lot of effort to minimise risk in the scenario you describe. I was certainly imagining ZFS replication being made to other online systems.
If we are dealing with critical systems where there is no other record being made of the important data that accumulates during that time, it's probably not a bad idea to replicate online in addition to keeping the offsite, offline backups. Doing that is more of an interim measure that will mitigate the loss of the primary server by minimizing the data loss for little ongoing effort (as it's automated).
However, a strictly online "backup" does not guard against an online disaster for example. e.g. if both boxes are rooted and the cracker decides to secure shred both of your copies. See this
post. And read the thread, especially monkeyboy's posts. My aim was to resolve a lot of those issues with my attempt at a solution. To quote monkeyboy:
monkeyboy said:
it ain't a "backup" unless it is 1) full, 2) on removable media, 3) offline, 4) offsite, 5) tested for actual restore...
AndyUKG said:
But an offline system using a different system vs and offline system using the same, I would say different is clearly better (without getting into details about each system and assuming both systems are considered production ready).
I can see that in the case where you have two backup systems (e.g. a straight ZFS system as in my example, and some sort of backup to tape that is not ZFS), it has potential to be even more reliable. This is of course provided that you have allocated the extra funds to things like testing that the restore actually works properly, documenting all your procedures, and so on. Often IRL there are compromises made. Maybe you use less tapes/HDD pools than you would with a single solution. Maybe you don't document them well. Maybe one doesn't get tested properly. In the process of attempting to eliminate that risk one can end up introducing more risk.
But then again, I've always been one to put all my eggs in as few baskets as possible, individually wrapped in bubble wrap, 100 feet under concrete under high ground, with castle walls, a moat, pillboxes with overlapping fields of fire... you get the picture.
And if we compare say, ZFS on regular system plus LTFS on the tapes vs ZFS on regular system and backups, then we also conceivably have two things that can go wrong in the former, where only one thing can go wrong in the latter. I realize that we are probably thinking that "LTFS is tape, it can't go wrong", but really it's just another filesystem. If anything, because something like ZFS is used on live systems, any bugs are going to be noticed and corrected that much sooner.
Filesystems would have to be one of the most extensively tested software in existence, because
- Every computer uses at least one filesystem
- The filesystem is in use all the time, every time the computer does virtually anything.
- When there are bugs, particularly data destroying bugs, people get very, very mad. They WILL let someone know about the bug, and if it's much of a problem they will quickly use something else.
- Designers know this, and particularly with filesystems that are designed to be used in servers, especially servers that are on the more reliable end of the scale (e.g. ZFS), they are going to be more conservative, do more testing etc.
Once a filesystem has been used in the field reliably for a reasonable period of time on a reasonable install base, the probability of there being some sort of showstopping, data destroying bugs, especially ones that would somehow not show themselves after something like repeated successful zpool status checks, imports, exports and the like, and then simultaneously render all of your backup pools unusable, even those a few months old or so, while functioning perfectly on your primary system up until that point... in my estimation would be remote. To the point where (/me puts on the dogbert hat) a hand-crafted company destruction script that makes it look like you are making backups when in actual fact you are shredding all your backups until the fateful day when your primary system is destroyed - might be more probable than that scenario. Or even if you use tape, a sysadmin with an axe to grind decides to surreptitiously destroy all your tape archives that are theoretically only written to once, along with all other copies of the organization's data.
I guess the thing to realize with risk management is that try as you might, you can never get the risk to zero. Even if you decide to nuke it from orbit, maybe the aliens are already on board.
AndyUKG said:
Also in the case of backups, where people are doing for example 1 backup a day or more, I think in the instance of disk to disk backup you won't commonly see people taking those disks offline each day.
Where I used to work they would take a tape backup each day, and take them offsite each day in a rotation. It would have made no difference to the ease in which that procedure was done to use HDD instead. Even something like 12 disks in a (padded) camera bag, a woman can still carry that by herself. The way I suggest doing it (if you read all the articles I wrote, you'll find it there in the preface articles) is use standard internal HDDs, put them in the cheap $4 silicone HDD cases (which basically provide some small shock protection, stop them sliding around your desk, allow stacking of them on said desk as high as you'd want (they interlock), and provide access to the data and power ports). Excuse the PATA HDD in the image, we'd use SATA of course.
e.g.
You connect the HDDs (still in the cases) to SATA data + power extenders (otherwise you'd have to remove the silicone cases).
You use dual e-SATA HDD docks, that you connect the above extender to.
You connect the docks via e-SATA to your e-SATA back plates, which are in turn connected to regular internal e-SATA ports on your motherboard or SATA card.
Making a backup is as easy as:
- Put HDDs on a flat surface.
- Connect to the SATA extenders coming from each HDD dock (two per dock, obviously)
- Turn on each HDD dock.
- Wait 10-20 seconds for your HDDs to spin up and detect.
- Execute the backup script.
- Flick off HDD dock switches when script finishes.
- Remove HDDs from extenders.
- Stack HDDs in padded bag (e.g. camera bag) to be taken offsite.
AndyUKG said:
As I mentioned, I am using ZFS as backup on some systems too, I'm not saying its bad. But in summery my opinion would be that if the data is really critical, then putting all your eggs in one basket is a sub-optimal backup solution. But then when you are designing a backup solution you are always making choices, based on a requirement which will differ from situation to situation, all of which will have pros and cons and a price...
Exactly.