Backup solution for ginormous ZFS pool?

Terry_Kennedy · Jun 4, 2010

First, this isn't a potshot at ZFS - I've been stress-testing it heavily for some weeks now and it is the greatest thing since sliced bread. But... it is somewhat lacking in native tools for disaster recovery.

I collected a bunch of parts to make a nice large NAS - 32TB of disk, and a 48-tape LTO4 robotic tape library. After test-filling the 32TB to about 25% capacity, I then tried to use # dump to make a test backup to the tape library. Result: whaddaya mean, unknown file system? x(

After doing a number of extensive searches for any remotely relevant keywords, I found people use a number of solutions:

Use zfs snapshots
Use zfs send/receive
Use amanda
Copy small pieces at a time to another filesystem and use dump
Just hope nothing bad happens :\

ZFS snapshots and send/receive don't address the underlying issue - having one or more complete copies of the data in another location for disaster recovery. For a similar perspective on this, I suggest reading the SmallNetBuilder article Smart SOHOs Don't Do RAID. Despite the unfortunate title, the article is sound - RAID != backup.

I installed the amanda port and was overwhelmed by the configuration options. I contacted zmanda and asked for a quote for configuring Amanda Community Edition, and got back a quote for building a zmanda-supported port of the Amanda Enterprise client to FreeBSD and having it send dumps to a supported server platform (such as Linux). I'm not sure if I was unclear in what I wanted, or if they feel that things are best done with Enterprise on another platform - in any event, it was a non-starter. I'll attach my original message to them so you can see what (I think) I was asking for.

So, I've got this large pile of data that wants to go to tape in some relatively-sane manner. I'm certainly not committed to using Amanda - just about any open-source solution will do, and if configuring whatever it is is beyond me, I'm willing to pay for some reasonable consulting to get it done. Any suggestions?

Here's the relevant piece of the message I sent to zmanda:

I am setting up a new fileserver using FreeBSD 8.1. The system has 32TB of disk in a multi-level ZFS structure. A single mount point of 22TB is exposed by ZFS.

I also purchased a Dell TL4000 48-tape LTO4 robotic library (this is the same unit as the IBM TS3200). This library uses barcodes on the tapes to identify media in the library.

Everything was going well (I'm seeing over 500MB/sec read or write to the ZFS array) and I have successfully backed up the operating system (non-ZFS) partitions to tape. MTX sees the drive and can move tapes, read the tape barcodes, and so forth. I then tried to use "dump" on the ZFS partition and got the "unrecognized filesystem" message.

After searching, I found that a) dump doesn't support ZFS and b) nobody has a simple ZFS tape backup solution. Apparently everyone says "use ZFS snapshots or ZFS send/receive". Apparently they are under the misconception that RAID == backup.

The few people who are doing tape backup with ZFS seem to be using Amanda, so I installed it (2.6.1p2, latest stable, from FreeBSD ports). However, there are a zillion config options, and I find it very confusing. I don't want to make what could be a trivial mistake which could cost me my data in the event of a ZFS failure.

What I'm looking for:

I'd like to get an Amanda configuration that lets me back up the entire ZFS partition, located on the same system as the tape drive and Amanda, to the tape library, with automatic tape changing as needed, on an on-demand basis. There will be no scheduled backups, no incrementals, no clients, and no on-disk holding area - just a simple partition-to-tape.

Successful completion of this goal will be when I test:

1) Restoration of a single file from the backup set
2) Restoration of a directory and its contents from the backup set
3) Restoration of the complete ZFS partition from the backup set
4) A subsequent full backup which sucessfully re-uses some or all of the
tapes from the first backup run

carlton_draught · Jun 4, 2010

Terry_Kennedy said:
After doing a number of extensive searches for any remotely relevant keywords, I found people use a number of solutions:

Use zfs snapshots

Use zfs send/receive

Use amanda

Copy small pieces at a time to another filesystem and use dump

Just hope nothing bad happens :\

ZFS snapshots and send/receive don't address the underlying issue - having one or more complete copies of the data in another location for disaster recovery. For a similar perspective on this, I suggest reading the SmallNetBuilder article Smart SOHOs Don't Do RAID. Despite the unfortunate title, the article is sound - RAID != backup.

I realize you are looking at a tape solution. I just wanted to address the "complete copies of data in another location" thing. I completely agree that using RAID or ZFS ZRAID or mirrors alone is not in any way, shape or form a backup. But there is nothing wrong with using zfs send/receive (along with snapshots), provided that you are sending to at least two disks (or pool) that is either already offsite or regularly taken offsite.

I've actually been working on a script to make the above disk-based backup much, much easier and routine for a month or more now. It is not far from release. However, I can't help you with the tapes. I will be interested to hear how this plays out though. Do tapes have anything corresponding to a ZFS checksum? i.e. When there is the tape equivalent of a bad sector, is anything going to let you know? And is there any redundancy?

magickan · Jun 4, 2010

Depending on how ginormous this is, and how big the deltas are, your networking capabilities, is building another ginormous zfs pool elsewhere and syncing between the two not an option?

jalla · Jun 4, 2010

You could possibly setup a front-end to handle your taperobot, mount your zfs volumes over nfs, and use a backup app that supports dumping nfs-mounted partitions.

A quick search for "amanda backup nfs" indicates it should be doable with amanda.

Terry_Kennedy · Jun 4, 2010

magickan said:
Depending on how ginormous this is, and how big the deltas are, your networking capabilities, is building another ginormous zfs pool elsewhere and syncing between the two not an option?

There actually will be an off-site replication server a few blocks away, connected via GigE. That doesn't solve the case of there being a disaster in this city, though, and also doesn't deal with recovering files that were deleted for good reason, but later found to be needed and restored from tape. Even assuming no compresssion, a full 22TB ZFS setup will fit onto 25 or so LTO4 tapes at a total cost of $850 or so in tapes. I can keep a very large number of backup sets for the cost of just the disk drives, let alone the whole file server.

Terry_Kennedy · Jun 4, 2010

jalla said:
You could possibly setup a front-end to handle your taperobot, mount your zfs volumes over nfs, and use a backup app that supports dumping nfs-mounted partitions.

A quick search for "amanda backup nfs" indicates it should be doable with amanda.

As far as I can tell, Amanda can back up from zfs directly. So I'm not sure what I'd get by going with a front-end server. But this is the same thing the zmanda folks came up with, so if I'm missing something, please let me know.

carlton_draught · Jun 4, 2010

Terry_Kennedy said:
and also doesn't deal with recovering files that were deleted for good reason, but later found to be needed and restored from tape.

That's the purpose behind a regular snapshotting regime - so that you can restore files that have been deleted for good reason. They only take up as much space as your data is changing over time, and also work hand in hand with sending incremental updates to your backup pool.

Even assuming no compresssion, a full 22TB ZFS setup will fit onto 25 or so LTO4 tapes at a total cost of $850 or so in tapes. I can keep a very large number of backup sets for the cost of just the disk drives, let alone the whole file server.

True. The HDDs alone would be about $1400 or so, assuming USD.

magickan · Jun 4, 2010

Terry_Kennedy said:
also doesn't deal with recovering files that were deleted for good reason, but later found to be needed and restored from tape.

Guess it depends on what your using to sync, rsync for example would keep the deltas but i would look more at the inbuilt snapshotting of zfs. its tcp based thou and single streamed, so not great performance, but we see it being used for this function. If the deltas arent to bad then it can work ok.

Terry_Kennedy said:
That doesn't solve the case of there being a disaster in this city, though Even assuming no compresssion, a full 22TB ZFS setup will fit onto 25 or so LTO4 tapes at a total cost of $850 or so in tapes. I can keep a very large number of backup sets for the cost of just the disk drives, let alone the whole file server.

Fair enough the cost of the media is pretty cheep, but you do have ancillary costs like where the data is going to be housed. Tbh thou i can't see how your going to get around not spending a fair bit of money to have full incremental backups off site and in a different city.

If you can spend the money, a connection, either p2p or via a provider that doesnt charge usage rates, plug gig e in your end, other end at a datacenter, rent half a rack, and then use snapshots with syncing?

Terry_Kennedy · Jun 5, 2010

magickan said:
Fair enough the cost of the media is pretty cheep, but you do have ancillary costs like where the data is going to be housed. Tbh thou i can't see how your going to get around not spending a fair bit of money to have full incremental backups off site and in a different city.

It isn't nearly as expensive as it seems - a bunch of LTO4 tapes and some Imation DataGuard cases, and I'm all set.

If you can spend the money, a connection, either p2p or via a provider that doesnt charge usage rates, plug gig e in your end, other end at a datacenter, rent half a rack, and then use snapshots with syncing?

I do have multiple GigE fiber links to a site a few blocks away, where the replication will happen. As I mentioned in an earlier reply, that isn't cost-effective for multiple complete sets of backup. Hence, the tape.

I also need to guard against a software failure taking out both the primary and replicated ZFS pools. I tried to explain this to DEC when they were developing the VAXft system - software failure is more likely than hardware failure, and all that VAXft achieved was to have a couple, arm in arm, walking off the same cliff at the same time. That did not earn me many friends in that product group. The only way to achieve that level of software fault-tolerance is with an N-way voted system with different software implementations on each member. In the past, I worked on one such system, a 5-way voted system. And even then, during development and testing we had some 3-to-2 votes where 3 of the systems were wrong.

Terry_Kennedy · Jun 5, 2010

carlton_draught said:
I've actually been working on a script to make the above disk-based backup much, much easier and routine for a month or more now. It is not far from release. However, I can't help you with the tapes.

I'd be interested in this when it is ready - as I mentioned elsewhere in the thread, I will have an off-site system with identical hardware for redundancy.

I will be interested to hear how this plays out though. Do tapes have anything corresponding to a ZFS checksum? i.e. When there is the tape equivalent of a bad sector, is anything going to let you know? And is there any redundancy?

I haven't actually run into a tape read error (on a restore) since the days of 9-track open reel drives. Modern drives (at least DLT and LTO) compute their own checksums and are capable of decent error recovery. Which is good, because at least with DLT8000, a hard read error would usually make the rest of the tape inaccessible.

The only errors I've had with modern tapes have been the drive rejecting brand new media - my first SDLT600 drive + tapes came with a 10-pack of defective tapes, and I'd say I have run into about a 10% out-of-the-box failure rate on SDLT600 media. That, combined with Quantum declining to honor their "lifetime media warranty", is why I switched to LTO. Once the tapes are written, though, I've never had a problem restoring from them.

carlton_draught · Jun 6, 2010

Terry_Kennedy said:
I also need to guard against a software failure taking out both the primary and replicated ZFS pools. I tried to explain this to DEC when they were developing the VAXft system - software failure is more likely than hardware failure, and all that VAXft achieved was to have a couple, arm in arm, walking off the same cliff at the same time. That did not earn me many friends in that product group. The only way to achieve that level of software fault-tolerance is with an N-way voted system with different software implementations on each member. In the past, I worked on one such system, a 5-way voted system. And even then, during development and testing we had some 3-to-2 votes where 3 of the systems were wrong.

This is an interesting concept, one I had not considered before. Thanks for bringing it up.

I've been thinking about this though... are you talking about a failure of the filesytem? e.g. a failure of UFS, ZFS, ext3, etc? Because if you are, the probability of failure would be something like a/x, where a is the number of reported errors due to a bug with the filesystem, and x is the estimated total number of installations. You might get a feel for "a" by googling, not sure how to get x - maybe add up the solaris and freebsd install estimate and divide by 2, for example. Anyway as the filesystem matures this should asymptote to zero.

I guess there is also the possibility for corruption of the operating system components that are responsible for filesystem maintenance. Which is why I run a ZFS mirror for that too.

And there is the backup software - e.g. if you use UFS, dump/restore, amanda, rsync, whatever. Though I would suspect that they have probably squeezed the critical bugs to near zero by now as well.

Thanks for the info on the tapes btw. Will let the forum know when the script is ready for release.

danbi · Jun 7, 2010

The ZFS send/receive is just a file.

I have not used tapes since many, many years, but in the old days, one would do sort of

# dd if=file of=tape

and be done with it.

I suspect modern tape drives are much smarter than older ones and a robotic system would permit you to just send a (large) file to the tape drive and not worry about filling up the tape etc.

Think of zfs send/receive as the UFS dump/restore tools and life will be much easier for you.

What is the speed of your tape system read/write (real life)? If it is not much faster than the Gbit connection to an archive server, you might indeed consider building a backup/archive server, even if initial costs seem high. For such setup, I would be more concerned about power consumtion/heat dissipation of the backup solution anyway.

Restoring from a tape is only so much useful when you restore everything. With a disk-based archive server, you may restore individual files way, way faster.
Of course, with ZFS you do not need to restore 'deleted' files from any external media, if you use snapshots.

Terry_Kennedy · Jun 8, 2010

danbi said:
I have not used tapes since many, many years, but in the old days, one would do sort of

# dd if=file of=tape

and be done with it.

I suspect modern tape drives are much smarter than older ones and a robotic system would permit you to just send a (large) file to the tape drive and not worry about filling up the tape etc.

The backup won't fit on a single tape in any currently-existing tape format - the LTO roadmap shows LTO-8 storing 12.8TB in 2017, but even that won't hold the whole filesystem on one tape.

Tape robots aren't that smart - each tape will report EOT to the host system, and then the host does whatever it needs to close the tape and command the robot to load the next tape.

Also, a simple dd or similar doesn't meet the requirements of being able to restore a single file or the contents of a directory, only a complete restore of the entire filesystem.

What is the speed of your tape system read/write (real life)? If it is not much faster than the Gbit connection to an archive server, you might indeed consider building a backup/archive server, even if initial costs seem high. For such setup, I would be more concerned about power consumtion/heat dissipation of the backup solution anyway.

I'm seeing about 77MB/sec (from a sloweer source drive - a 7200RPM UFS gmirror):

Code:

16840130560 bytes transferred in 216.809384 secs (77672517 bytes/sec)

That's using only a single drive - the library has 2 (and can hold up to 4).

While slower than GigE, you can't beat the price (and portability) of tape media.

AndyUKG · Jun 14, 2010

If you want to use tape then clearly you need something roughly equivalent to Veritas NetBackup. I dont have experience of any open source solutions, but from a complexity point of view I think it is much simpler to implement something using zfs send/recieve, which runs in conjunction with zfs snapshots. Your only issue was that you mentioned as a draw back your secondary system is in the same city, with the implication that this isnt sufficiently far away geographically from your primary system.
Ignoring that minor detail

zfs send recieve and snapshots gives you a full replicated environment, minimises network traffic (zfs send recieve is sending only changed blocks) and gives you pretty much as many historical point in time snapshots as you want (for those legitimately deleted files as you mentioned) for recovery. Obviously from a restore perspective it completely negates the need to call back from offsite tapes, put tape in drive, wait for tape to position etc etc
I havent actually got this fully running and tested myself, but Im working on it now....

thanks Andy.

mix_room · Jun 15, 2010

I have no idea if it does, but this might help you
http://blogs.sun.com/ako/entry/tape_backup_for_zfs

JohnDC · Sep 5, 2010

Terry any progress there..?

I've alredy spent too much time on Backup Exec for just daily jobs much less consistent or reliable jobs, so looking for options asap.

I'm new to BSD but needing to get a working FreeBSD Server backing up 6 2003 Server clients..

Was wondering if you had worked through the setup options sucessfully and what you think the level of difficulty would be for an Admin, but new to Unix guy..

My setup is fairly simple, I plan to backup to Disk on my FreeBSD Server at night, then to Arcvault12/LTO tape during the day for retention and offisite, and since we have the library and media..

Really needing to replace the crap Backup Exec Srvr 2003 and run Amanda or Bacula Backup Server on my new FreeBSD Server. Then backup 6 2003 Servers,(2 running Notes, and a couple running SQL)

Any good progress with your related work?
Thank you,
jc

da1 · Sep 5, 2010

carlton_draught said:
Do tapes have anything corresponding to a ZFS checksum?

basically no. they just get and store the data .. period.
there are ways of verifying the integrity of the data (CRC for instance) but that is not a feature of the tape but of the software.

When there is the tape equivalent of a bad sector, is anything going to let you know?

you will know the instant you cannot backup/restore to/from the tape or when you do some internal housekeeping like migration, stg backup, reclamation, move data, audit vol, etc. Once a "sector" on a tape = bye bye, that's pretty much it. Sure, you can recover the data but you would need to send the tape to a lab (costly like hell and lengthy - usually ~1 month) and by the time you get the tape+data back, you may well be over your data retention period

. it's soooooo cool sometimes lol

carlton_draught said:
And is there any redundancy?

Sure. If and only if you have a copy pool/tape. This of course implies that you are using half (normally way less than half) of your total tape storage capacity.

Lovely ain't it ?

Terry_Kennedy · Sep 5, 2010

JohnDC said:
Any good progress with your related work?

Not yet - summer is racing season, so I've been out all over the country in my race car. [If my profile picture/avatar would show, you'd see it 8-]

I'm almost positive that Amanda can do what I want, I just have to sit down and figure out how to turn off the parts I don't need. I found this article which may be helpful.

Terry_Kennedy · Sep 5, 2010

da1 said:
basically no. they just get and store the data .. period.
there are ways of verifying the integrity of the data (CRC for instance) but that is not a feature of the tape but of the software

Modern tape drives that anybody is going to use (LTO, DLT, and so on) have extensive error detection and correction logic.

Users of operating systems that have been around for 30+ years (like VMS) which had extensive facilities to recover data from bad tapes are starting to question the conventional wisdom of leaving those facilities enabled. Some discussion here.

AndyUKG · Sep 6, 2010

Terry_Kennedy said:
Modern tape drives that anybody is going to use (LTO, DLT, and so on) have extensive error detection and correction logic.

Users of operating systems that have been around for 30+ years (like VMS) which had extensive facilities to recover data from bad tapes are starting to question the conventional wisdom of leaving those facilities enabled. Some discussion here.

If its critical to have a good copy of your backup data, you will make multiple copies to tape (just like storing critical data on disk). You have no idea when a tape might go wrong and break or whatever...

da1 · Sep 6, 2010

AndyUKG said:
If its critical to have a good copy of your backup data, you will make multiple copies to tape (just like storing critical data on disk). You have no idea when a tape might go wrong and break or whatever...

my point exactly.

due to my line of work, I see it every day. The latest and greatest fail too, no matter what super-mega-ultra $#it technology they use. The best way is to have a copy of the copy.

At work we have 3 copies (1 disk + 2 tape pools (primary/copy pool)). And even so, some tapes go to hell.

Bottom line .. better safe than sorry when it comes to keeping backups.

Terry_Kennedy · Sep 7, 2010

da1 said:
Bottom line .. better safe than sorry when it comes to keeping backups.

Indeed. That's why I was so surprised to discover that there's no simple backup solution for large ZFS pools - I would have thought that Sun (at least) would have had a proper backup utility.

da1 · Sep 7, 2010

AFAIK, SUN has no such thing.

For all our SUN machines we have TSM (IBM/SUN libraries)

AndyUKG · Sep 7, 2010

Terry_Kennedy said:
Indeed. That's why I was so surprised to discover that there's no simple backup solution for large ZFS pools - I would have thought that Sun (at least) would have had a proper backup utility.

People who pay for Sun kit generally also pay (alot) for their backup solution. Sun will be very happy to resell you Veritas NetBackup or Legato Networker etc. As for free or open source solutions, they Sun dont have anything in that space.

phoenix · Sep 7, 2010

Terry_Kennedy said:
Indeed. That's why I was so surprised to discover that there's no simple backup solution for large ZFS pools - I would have thought that Sun (at least) would have had a proper backup utility.

They do have a solution, it's called "a second pool configuration" that you "zfs send/recv" data to. And you create as many "secondary pools" as you need, for off-site, redundant backups.