My recovery plan is to reinstall, then replicate the backup to /tmp, then cherry pick what I need restore. The server is mostly just a jail host though, so it's easy. (The backup drives contain data from multiple machines, so the backup pools are named
backup/<host>/zroot)
For a personal machine, I would be curious to see what other people do.
- Is there a reasonable way to back up to an external drive and then swap drives and boot from the backup?
- Or can I replicate the backup on top of a new currently-running zroot to restore it to the previous state?
- Or do people make an effort to separate the system files from the data to make recovery easier?
What are some of the options?
Here's what I do (and I do not recommend following this). My backup software is home-written, and I'm the only user. It is intended to protect me against (a) physical destruction of my server at home (such as fire or water), and (b) deletion or modification of files. It doesn't need to protect against single disk errors, since the file system being backed up is already using mirroring (and yes I know I should move to dual-fault tolerant RAID, but there are bigger problems on my to-do list at home). It (c) protects against a catastrophic bug in the file system software stack (ZFS and FreeBSD), but recovery from that would be inconvenient.
It sweeps the whole file system hourly, and looks for any files that have changed since the last backup (by looking at mtime and size), or for new files, and for files that have vanished (have been deleted). It then stores all the backups on a separate ZFS file system, which is on a separate disk, about 2m (6 feet) away from the server in a fire-resistant safe. The backup is deduplicated (using whole file dedup), and never deletes any backups, so for files that change all the time, there tend to be hundreds of versions stored. I sometimes clean those up manually. The backup is only 1.5x larger than the real file system, which shows that my file system at home is mostly used in an archival fashion, and rapidly changing files are manually excluded from backup.
This "local" backup is then supposed to be automatically replicated to a remote backup (which used to be a machine physically running in my office, and is now a cloud server). That remote replication has sadly been broken since last winter (nearly a year ago), when we had weeks of network outages at home; so for now the remote replication is done by having a pair of USB-connected disks which are stored far away from home in a secure location, and every week or two one of them is brought home and manually refreshed using rsync. I need to put a weekend of work into re-engineering the remote backup to (a) be up to date again and run automatically, and (b) be resilient against long network outages.
The remote backup does intentionally not use FreeBSD and ZFS; it used to run in Linux using ext4, and is now using MacOS and APFS. But if my server at home were destroyed due to a failure of FreeBSD or ZFS, then restoring it would be very inconvenient, since it would require a giant cross-platform rsync.
One of the design principles of my backup system is: I don't bother backing up the OS install itself. So the backup only contains /home, and to make re-install easier /etc and /usr/local/etc. So if my server were to be physically destroyed, I would have to first install the OS and get that to work (using old copies of /etc as a guide), then copy /home back.
I used to have a system where every hour I would take a copy of the non-home file systems (initially that was done by dd'ing the boot SSD to a second SSD every hour, later using rsync to update from the boot SSD to a small spare area on the data disk). Since my recent re-install (when the root file system moved from UFS to ZFS), that has been abandoned too. I need to get that back to life, and that's pretty high up on my to-do list.
So to your questions:
- No, I have not yet accomplished being able to boot from my local ZFS backup disk, but that hasn't been a goal. It would be nice to have, and I should invest a weekend of work into getting there but there's always something more urgent to work on first.
- Replicating the backup on top of the currently running system is not easy. The backup file system contains lots of "deleted or modified" files, which are marked by having a "#" in their file name. So a simple "zfs send ... | zfs receive ..." would not work. Plus, some metadata (file attributes such as owner/permissions) are not kept in the backup file system itself, but in a separate database. So the restore script works by issuing a huge series of copy commands, and restore is not automated. I've never had to perform a full restore of the backup, and it would be a multi-day ordeal if I ever wanted to. But given the next answer, that's probably not too bad.
- I deliberately do not backup the system files, meaning everything that can be restored by instead doing a new OS install, followed by ports and packages, and non-FreeBSD software (such as python modules from pip, my own software from my own mercurial server). This immediately implies that a full restore will be a massive multi-day ordeal anyway.
So far, the only restores I've had to do was for files that were unintentionally deleted, and then only a few files or directories at a time. I know the old saying "it's not a backup until you have done a restore", and I should really do a fire drill of attempting a full restore from the remote sometime. Maybe after I retire and before I have to look after grandkids ...