ZFS Simple backup using rsync

Hi folks,

I am very new to FreeBSD. I am a long time Linux user (since 1997). I support SuSE professionally and that used to be my choice for the home server.
I have recently decided to switch to FreeBSD because I like how "clean" this OS is (compared to the "systemd hell", "wicked" network interface management, etc..) and because I wanted to take advantage of a ZFS.

I am trying to decide on how to setup my backup. I used to run rsync using cron every night to sync my directories on an external hard drive and I would like to continue doing that in FreeBSD. I read about fantastic features like "zfs send" and "zfs receive", but I don't want to backup the complete pool, just selected directories.
My question is: does it make sense to use a ZFS on the external USB drive? If yes, do I need to always import and then export the pool after rsync is done with its job? Or should I use a different filesystem for my backups?
Perhaps there is even a better solution that I didn't think of?
I searched forum for answers, but I could not find anything that would match my scenario.
 
I have recently decided to switch to FreeBSD because I like how "clean" this OS is (compared to the "systemd hell", "wicked" network interface management, etc..) and because I wanted to take advantage of a ZFS.
Well meant advice: be careful that you don't treat FreeBSD as if it were Linux. It's not and treating it as such can easily come to bite you in the behinds in the longer run. There's tons of good documentation available (definitely bookmark the FreeBSD handbook) and maybe this can also be a good read for you:

https://forums.freebsd.org/threads/10-dos-and-dont-for-freebsd.65618/

I am trying to decide on how to setup my backup. I used to run rsync using cron every night to sync my directories on an external hard drive and I would like to continue doing that in FreeBSD. I read about fantastic features like "zfs send" and "zfs receive", but I don't want to backup the complete pool, just selected directories.
Well, just because you're using ZFS doesn't mean that you can't do this anymore. Of course there might be more efficient ways but.. one step at a time.

rsync isn't included in the base system but available in the ports collection as net/rsync. You can easily install it using # pkg install rsync or build the port I suppose (don't do both: either use binary packages (recommended) or build ports, but not both).

But you might be able to do this more efficiently. As SirDice already pointed out a ZFS pool can use different (virtual) filesystems. This allows you to keep data separated while they're still using the same storage space. There are several advantages to this (ranging from security settings to storage control) and one of those are snapshots.

If you dedicate a filesystem for /home (for example: zroot/home, mounted on /home) then making a "live" backup is easily done: # zroot snapshot zroot/home@backup. After that you can continue to use home and make changes, but the current state will be stored as backup, accessible through /home/.zfs/snapshots/backup.

Although it's definitely a usable form of backup it's still 'live', meaning that if your disk suddenly crashes then your backup is also gone. That's where zfs send can come into play: it allows you to grab that snapshot and store it somewhere. Of course using a RAID or mirror system reduces this risk as well.

Example: # zfs send zroot/home@backup > /opt/backups/home_061118.zfs. This would create the file /opt/backups/home_061118.zfs which contains your home directory at the time of making the snapshot. Which you could then store somewhere offsite, heck; maybe even using rsync.

My question is: does it make sense to use a ZFS on the external USB drive?
I'd stick with UFS myself. ZFS is resource hungry and at its best when you're using multiple disks as mirror or RAID. On a single disk, and in this case an external one, you won't gain much advantages. As mentioned above: one of ZFS's main advantages is the ability to create different filesystems / datasets. I seriously doubt that you'd need that on such an external disk (backup disk I assume?).

With UFS it's simply an issue of mounting & dismounting and you're done. ZFS is more demanding.

Hope this can give you some ideas.

But summing up: I'd start with rsync and a remote (UFS) backup disk for now. This will make things easier on you, especially now that everything is still quite new. Then I'd start experimenting with ZFS filesystems and snapshots, perhaps eventually move to (automatically?) creating snapshots and then storing those on your backup disk.
 
Yes, you can use ZFS on a USB-connected disk drive. I have that at home; my external backup drive is a 2TB USB 3.0 connected Seagate disk (the one you get for $60 at Costco when they are having a sale). It has a ZFS file system on it, and is used for backup. Works excellently ... except naturally, the disk drive itself is pretty slow.

In my opinion, two big advantages of ZFS exists on all file systems, even when you are not using RAID: Checksums and scrubbing. Checksums protect against undetected IO errors, which are not as rare as one commonly things (in particular on today's huge disks, in particular in the presence of vibration and off-track writes). And scrubbing gives you advance warning when disk hardware is failing, in some cases before the data is lost. Clearly, combining those with RAID is even better.

Do you need to ZFS import and export every time you run rsync? No. My external disk goes to sleep when it is unused for a while, which ZFS interprets as a "disconnection". When the disk is needed again, it wakes up, and ZFS automatically reconnects. The only thing I see is messages saying "vdev changed state" in my system log everytime the disk wakes up or goes to sleep. Now, I have not tried actually physically disconnecting the disk and putting it away for a "long" time (more than a few hours); if you want to do that, then ZFS import/export might be a good idea, or it might not, outside my experience.

Don't worry too much about ZFS's reputation of being "resource hungry". That's true, but for a relatively modern machine with reasonable-sized file systems it does not present any problem. At home I have an Intel Atom (meaning 32-bit x86, not 64 bit), with 3GiB of memory, running at roughly 1 GHz. It has three ZFS file systems, one with RAID-1 (mirroring), a total of about 6TB of file system space, and works excellently. It's just that with a CPU this slow, ZFS is no speed daemon. I think I get about 50 MB/s writing and 100 MB/s reading, which is good enough for my home server (and a factor of 2-4 slower than the disk hardware).

So while ShelLuser's advice isn't wrong, I think he's being a bit too cautious: You could go try ZFS directly on your external disk, and it will probably work fine.
 
Thanks guys for great advices and links! I already read the handbook, but I didn't know about the "dos and dont" post. Very informative!
I will certainly NOT treat a FreeBSD as it were Linux :) The whole goal was to actually move away from Linux for my home needs. I have enough Linux at work.

I forgot to mention my setup. I use two mirrored devs (2 striped mirrors with 4 x 4TB drives). That gives me ~7.2 TB of storage space. I only have 8GB of RAM however, so I like the advice to actually format my USB drive as an UFS instead of a ZFS. It will be easier to manage everything that way and this setup will be lighter on resources. I still feel like my data is relatively safe, but I always backup everything religiously anyways.

ralphbsz: I will be actually disconnecting my drive quite often and I am not sure how ZFS would react to that. Thanks for the advice though! It's good to know that such setup is doable.
ikbendeman: I don't have a second machine. My wife wouldn't let me use another "ugly and noisy box" for the purpose of making the data on the first ugly box safer :)
 
I do that very thing: rsync to a NAS for my user's /home directory data backup and I use dump to back up the OS to my NAS.

EDIT - and as usual, I didn't read the topic heading...:rolleyes: using UFS, not ZFS...sorry...
 
I use ZFS on an external USB disc. After connect I run 'import', then copy using rsync, then 'export' and disconnect. Works fine.
One thing to be aware - you may need to set max memory usage for ZFS ARC, otherwise it might consume all the memory you have and won't relinquish it for other processes. At least that is my consistent experience with ZFS.
 
You may use UFS on the external drive (USB) and backup a ZFS dataset (ZFS filesystem) onto it in a file:
Code:
# zfs send zroot/dataset@snapshot > /mnt/usb/backup.today

You may create as many datasets, as you wish, for any part of the filesystem.

For example, if you used ZFS in the installer, than you already have dataset zroot/usr and other datasets/filesystems.

On the other hand, if you have a backup pool (a secondary pool), than you can:
  1. create a first recursive snapshot A on the source dataset
  2. replicate the dataset snapshot with all sub-datasets onto backup pool
  3. at some later time, create another snapshot X (possible with K snapshots in between)
  4. send a delta-stream, replicating all snapshots recursively between A and X onto backup pool
This way the backup pool will keep the entire history and content of the source dataset (note: you can do non-recursive operations, if only a single dataset must be replicated without children).

ZFS is a volume manager and a filesystem, thus it knows which block have changed between A and X, so replications is much faster than rsync, which works on the filesystem level, and thus scans all folders recursively.
 
skhal : I tried the approach with the UFS and "zfs send" on a test system that I setup.

The first snapshot that I copied to the external drive using the "zfs send" was as large as the original filesystem (that was expected). The second one, though was exactly the same size. So, I guess, without creating a backup pool (your second suggestion), I cannot really send any delta to the external device by playing with snaphots and "zfs send" (unless I am doing something wrong). That means, I will always be sending the complete filesystem to the USB device, correct?

The second solution is very interesting, however I would need to learn a little bit more about ZFS - snapshots and replication. I was looking for some good manual, but most info that I was able to find is from Oracle and I am not sure if the implementation of ZFS on FreeBSD is exactly the same (I think I found one command that did not work on FreeBSD - something about getting the dataset properties). Is there a manual that covers ZFS on FreeBSD with regards to snapshots and replication? Or perhaps the one from Oracle can be used as the implementation is the same?
 
The first snapshot that I copied to the external drive using the "zfs send" was as large as the original filesystem (that was expected). The second one, though was exactly the same size. So, I guess, without creating a backup pool (your second suggestion), I cannot really send any delta to the external device by playing with snaphots and "zfs send" (unless I am doing something wrong).
ikbendeman is fully right: look more carefully into zfs(8) (the zfs send section in specific):

Code:
         -i snapshot
                 Generate an incremental stream from the first snapshot (the
                 incremental source) to the second snapshot (the incremental
                 target).  The incremental source can be specified as the last
                 component of the snapshot name (the @ character and
                 following) and it is assumed to be from the same file system
                 as the incremental target.

                 If the destination is a clone, the source may be the origin
                 snapshot, which must be fully specified (for example,
                 pool/fs@origin, not just @origin).
See: a snapshot is nothing more but a reflection of your filesystem (the ZFS dataset) from that specific moment (moment of snapshot creation). But in order to create an incremental stream you'll need to tell ZFS what differences it should send (so: the differences between two snapshots).

You might get confused with checking zfs list because when first created a snapshot doesn't take up much space. But that's not the data you're sending when you back up a snapshot! A snapshots "physical" size is only determined by the amount of changes which got applied to the filesystem; a snapshot basically only keeps track of those.

But when you dump a snapshot ( zfs send) you basically make a backup of all the data in the state at the time of snapshot creation.

Example:

Code:
peter@zefiris:/home/peter $ zfs list -rt all zroot/home
NAME                USED  AVAIL  REFER  MOUNTPOINT
zroot/home         33.6G  18.9G  30.8G  /home
zroot/home@011118   222M      -  26.4G  -
zroot/home@021118   518M      -  27.0G  -
zroot/home@031118   150M      -  26.7G  -
zroot/home@041118   153M      -  26.8G  -
zroot/home@051118  42.8M      -  26.9G  -
zroot/home@061118  76.9M      -  28.9G  -
zroot/home@071118  1.37G      -  30.3G  -
Even though 031118 only takes up 150M for now its physical size is 26.7G. That's what I'd be looking at if I'd dump that individual snapshot. But the changes between 031118 and 041118 is a completely different story.

Hope this can give you some better ideas on how this works :)
 
Thanks!

That was an excellent explanation. So, in practice I would need to create a snapshot, send it to the external device and then keep sending the incremental streams. This also means that I cannot afford to lose the first snapshot (for example due to a disk error), because I will not be able to restore anything. Since the snapshot sent via "zfs send" is basically a single file, I would need to worry about its consistency.
The size of my data set that I want to protect is about 4TB and it will keep growing. It mostly consists of photos. I believe it will be easy for that large file (the first snapshot) to become inconsistent over the time on the external drive with UFS. I think I will setup the simple backup for now - with the external drive formatted to UFS and rsync. If I lose one or two files - not a big deal :)
I can always change that in the future when I have a second machine and can play more with backup pool on ZFS.
 
zfs send has two options:
  • -i creates an incremental stream between two snapshots
  • -I includes all intermediate snapshots
If the other side would be another ZFS dataset, probably -I is the best option as it keeps everything on the receiving side.

Consider a dataset zroot/source/data and destination parent dataset tank/backup.

I'd do the backups in the following way (assuming the commands are run as root or the user is granted backup privileges using zfs allow -- which is the preferred way):
  1. snapshot the dataset recursively, including children, for the first time:
    Code:
    zfs snapshot -r zroot/source/data@A
  2. replicate the entire dataset with sub-sets and snapshots:
    Code:
    zfs send -R zroot/source/data@A | zfs receive tank/backup/data
    the destination snapshot should not exist
  3. at some later time, add snapshots B and C to the source dataset, and then replicate everything between A and C:
    Code:
    zfs send -I zroot/source/data@A zroot/source/data@C | zfs receive tank/backup/data
Speaking of documentation, I tried different online manuals and the handbook. But the best documentation turned out to be FreeBSD Mastery: ZFS by Michael W Lucas. It takes time to go through the book, but it is worth it.
 
Isn't it also possible to do an NFS mount inside of a ZFS filesystem (i.e. import, not the export function), and set that to be the default location for snapshots? Then you'd just have to specify, first, a full snapshot, then incrementals from that point on? I've not had to do that but I believe I read about it along time ago. There's ways to mount UFS inside of ZFS and do it, I am certain and since NFS is fairly well integrated, I would assume both are possible.
 
skhal : Thank you so much for this solution. This makes perfect sense.
I created a ZFS pool on my USB drive for backups and used it to replicate snapshots. It works very well.
I still need to write a script so that it happens automatically every night. I also need to figure out a way to manage snapshots, but that's a minor thing. If I start taking snapshots every night, I will end up with plenty of them soon.. :)

I also need to adjust the max memory usage for ARC, as tankist02 said. I only have 8GB of RAM and the combined pools size is about 15TB.
 
Isn't it also possible to do an NFS mount inside of a ZFS filesystem (i.e. import, not the export function), and set that to be the default location for snapshots?
Yes and no. ZFS behaves just like any other filesystem so you can mount whatever you want in your tree. Heck, FreeBSD also has an automounter (see autofs(5)) which could make this easier to access. However, the zfs command doesn't know anything about a default location. But that's something which a shell script can easily handle.

But I wouldn't really advice this setup because although NFS is pretty lightweight it also has some nasty quirks. If, for whatever reason, your connection between the server gets disrupted then you're looking at some heavy upcoming timeouts which can seriously disrupt a backup service.

Therefor it's easier (also for the service itself) to rely on other protocols, SSH for example, and use that on a per-task basis. An example:
# zfs send zroot/home@backup | ssh backup@server "dd of=/opt/backups/home.zfs".

This would create a dump of zroot/home@backup and send the data stream into ssh which in its turn starts dd on a remote server in order to direct the stream into a file (/opt/backups/home.zfs).

The advantage this has over NFS is that the connection is used on a per-task basis and SSH is much more efficient with connecting than NFS is, you don't have to suffer from massive timeouts if you don't want to (either through ConnectTimeout in /etc/ssh/ssh_config or by using the -o parameter for ssh, or both: creating a dedicated ssh_config which is specifically used when connecting to backup servers).

(edit) I am aware of NFS timeout options with mount_nfs(8) which could also be used either through or with auto_master(5). I'd still argue that SSH is more reliable, and better yet: also has much better access control.
 
Back
Top