Is it foolish to use zfs snapshot as a backup plan?
I have a firend who told me that zfs snapshot is a very poor way of doing backup and I will be better off using something like rsync.
How paranoid you need to be about backups depends a lot on what is being backed up, and what the impact of losing that data
forever is.
* If the data is random "media files" downloaded off the Internet, then you can probably replace them (or get different ones) without too much trouble, then the cost of even a single backup might be considered excessive. If the original and all backups are lost, you just have to go download some more "media files" from the Internet.
* If the data is of personal value to you, then an occasional copy to an encrypted external drive you keep at your cousin's house might be sufficient. If the original and all backups are lost, you may no longer have pictures of your wedding, kids, etc.
* If the data is something like the source code to your best-selling utility, you want multiple copies at multiple locations, presumably encrypted. If you lose the original and all backups, your company is either out of business or your brand image is seriously tarnished.
* If the data is personal info (health records, material usable for identity theft, etc.) then your company is probably out of business and you have government investigators investigating, threatening fines, and so on.
Once you've established how important it is to have one or more backups, you now need to assign priorities to various scenarios and see what sort of recovery can be attempted with the backups you have.
* You have a single ZFS pool and rely on snapshots for backups. The power supply in your system fails and decides to zap all of your disk drives. Whoops, no backups.
* Same as above, but a software bug or power failure corrupts the pool. Maybe you can recover some data, maybe not.
* Taking the above scenarios to heart, you splurge on a second system with an equal-or-larger ZFS pool. Maybe you use a different operating system, different brand of hard drives, etc. to try to make the systems as different as possible to eliminate the possibility of a fault on one system taking out the second system. But a tree falls on your house, or your server room gets flooded during a storm.
* You decide to get a very expensive data circuit (or string your own fiber through the woods) to a second site, and set up the second system there. But a hurricane comes through and damages both sites. Or some bug, malicious activity, or simple mistyped command damages your "master" pool and your replication faithfully copies the damage to your backup site.
* Same as above, but substitute "cloud backup" for "second site".
* You decide the best way to be safe is to have the data in a universal format that can be understood by most *BSD/Linux systems, regardless of whether they support ZFS. You make tape backups of the entire pool (
not incrementals) and store them offsite in a secure facility.
The above is designed to provoke thought and thoughtful discussions on what an appropriate backup (and recovery - never forget recovery!) strategy is.
In my case, on my home
RAIDzilla 2.5 storage appliances, I have multiple tiers of backup:
o Daily local replication to a second, identical system in the same cabinet (useful for recovering from "oops" errors).
o Daily replication via 10GbE private fiber to a second, identical system located a few miles away, on a different electrical substation and on the other side of the hill I'm on. That site can serve as a disaster recovery site if needed.
o Weekly backups of the entire pool (in GNU
tar format) to LTO-6 tape media via a robotic library. The media is then removed and placed in special hermetically-sealed tape transport cases, a seal applied, and it waits for the courier to pick it up and store it at a secure facility several states away. Some backup sets stay there "forever", while others come back and are re-used.