ZFS Need advice on how to back ZFS datasets immutably to remote machine

Currently my backup server is running a different OS (OpenBSD) which does not support ZFS, so the backup is simply an rsync of the current ZFS filesystem (not of a specific snapshot, but the active filesystem). In a few weeks I will be moving to a backup server running FreeBSD. I would like to find a way to actually replicate datasets from the file server to the backup server along with all existing snapshots. I want the structure of the data and snapshots on the backup server to be identical, and I would like nothing outside the backup process to be able to touch the duplicated datasets (which is what I mean by immutable). Most likely the destination zpool will either be exported or mounted read-only during normal operation, and only imported or made read-write when a backup is happening. If anything has happened to the destination datasets (for example, some change made since the last snapshot) I need that to be discarded the next time the backup happens.

On the fileserver, I have a python script which handles all of the snapshots (including purging old snapshots). Each dataset has yearly/monthly/weekly/daily/hourly snapshots, which I would like to replicate to the backup server. The backups only run once per day, so it would be acceptable to simply replicate each dataset up to the most recently snapshot (whatever it is), because any additional snapshots or changes that happen during the day should be replicated the next time it runs. The source data is heavily snapshotted, so this is intended solely to protect against complete pool loss on the file server. It's acceptable for up to 24 hours of data to be lost if this happens.

I can replicate the zfs datasets manually on the backup server and continue to replicate via rsync, and simply have the backup server itself handle snapshotting and purging using the same script via cron. It feels like there has to be a better way to do this, though (possibly using zfs send/recv). If anyone has any great ideas on how to accomplish, I'm all ears. I'm pretty good at scripting things, so it doesn't matter (up to a point) how complicated the method would be to do this. I've looked into syncoid, but at first glance it doesn't look like it does quite what I want as-is. However, if someone's got a set of options they know of that would enable me to use it (as either the entire solution, or as part of the solution which can be wrapped in a shell script to accomplish the rest), that's just fine with me.
 
You can enable a user of your choice to be able to use `zfs receive`, but not other zfs commands. That is one way to make backups immutable. An attacker taking over that user can sabotage new backup making, but they cannot drop existing snapshots.
 
I have probably tested all open source tools out there. In the end, I have settled with zrepl: it enables you push or pull based zfs replication, and offers a good scheme for keeping and cleaning zfs snapshots.
 
After testing all of these out none of them seem to work for my personal use case. Feel free to let me know if I'm incorrect and there's something out there that I missed.

1) Must sync all of a dataset's snapshots without also recursively copying child datasets. For example, must be able to backup '/home', '/home/zaragon', and '/home/user2' without backing up '/home/backup'.
2) Must be able to backup datasets by default without me having to remember to add them to a config file. Following the above example, if I add '/home/user1', it needs to replicate that dataset without any additional configuration from me (because I will forget, and then that user will not be backed up). It is OK to require explicit configuration to exclude datasets (i.e. assume every child of '/home' will be backed up unless I explicitly configure otherwise).

Personally I would prefer to use custom zfs attributes to control the replication, because that way I never need to remember to modify a config file and inheritance is automatic (that's how my zfs snapshot script handles determining which datasets get snapshotted and how frequently). But it's not a hard requirement.
 
i use zxfer which is a shell script
it can do non rescursive
you can probably autogen a list of datasets the same as for snapshotting
 
Back
Top