Restoring ZFS snapshots

It seems ZFS snapshots are very useful because of their generation speed and possibility to send them to a remote location.

I'm interested in using them for backup purposes.
Basically to periodically make ZFS snapshots of my NAS box and send the snapshot to another computer to store it.

However I'm a bit confused when it comes to deleted and moved files.
Say on day X I have a filesystem in a certain state. I take a snapshot and send it to another computer.
Then on day Y I decide to rename/move/delete some files/directories. I then take another snapshot and send it to the same other computer.


Now I do something unwanted, for instance delete an important folder. AFAIK I can browse the ZFS snapshot (and manually recover the affected folder) or restore the snapshot itself.

If I restore the snapshot from day Y, then everything should be exactly the same as day Y, meaning the old versions of the renamed/moved/deleted files (made between day X and Y) won't be there. Or the renamed files will be duplicated and the deleted files restored (even though I don't want to)?

If I restore the snapshot from day X < Y, then everything should be exactly the same as day X, right?


Maybe the example above isn't clear enough but my concern is about restoring recent backups that may restore files I deleted 2 years ago (and 700 snapshots ago if made daily) and which I don't want anymore.

Please ask if not clear enough.
 
luckylinux said:
Maybe the example above isn't clear enough but my concern is about restoring recent backups that may restore files I deleted 2 years ago (and 700 snapshots ago if made daily)

This wouldn't happen.

Say you had snapshots on the 1st Jan 2012, 1st Jan 2013 and 1st Feb 2013, eg:

storage/data
storage/data@01-02-2013
storage/data@01-01-2013
storage/data@01-01-2012

Each of those snapshots is an EXACT representation of how the file system looked on those dates. If you rolled back to the 1st Feb, your file system would look exactly like it did on that date, you wouldn't be restoring files that were present in the other snapshots.

Usually, you wouldn't bother rolling back though (as every single change made since the snapshot would be lost), you'd just go into /storage/data/.zfs/snapshot/01-02-2013/ and pull out the files/folders that you wanted.

If you lost your entire system and had to rebuild, you could ZFS send/recv the latest snapshot from the backup machine which would leave you with a file system exactly as it was when that snapshot was taken.
 
usdmatt said:
Each of those snapshots is an EXACT representation of how the file system looked on those dates. If you rolled back to the 1st Feb, your file system would look exactly like it did on that date, you wouldn't be restoring files that were present in the other snapshots.
Thank you for your clear answer, usdmatt ;).

Just another question: if on 1st January 2013 I had a big file X and on 1 February 2013 I renamed it to Y, the space occupied by the snapshots would be size(X) + size(Y) (plus all other files of cours), unless dedup was on, right?
But if I use zfs send to store backups to another machine, does dedup have to be on on the destination host or the NAS itself? In the first case I have a maximum of 32GB of RAM, in the second one (for now) 64GB of RAM. I suppose this should be enough for say 6-12 TB of very important DATA.
 
@luckylinux

There´s a big difference in copying data file by file compared to sending the data as a stream. The biggest overhead that rsync has (which it mitigates very efficiently though) is that it takes one file, copies one file over the net, takes another, over and over. With ZFS, it´s just one very big stream of data. Both use ssh for the transport, both can do "incremental" backups, but ZFS can also send along the dataset´s properties, like compression, quota, sharing and so on.

Do not enable dedup! For the love of god, do not go there! It really isn´t worth it, it really, really isn´t. I´m stating myself as a cautionary example:
https://forums.freebsd.org/showpost.php?p=206859&postcount=10

/Sebulon
 
luckylinux said:
Looks interesting. AFAIK it is NOT a snapshot manager but keeps in sync a remote and local filesystem.

Yes, and this is also one of it´s biggest strengths, since it allows you to install another tool that can handle the snapshot management for you, like sysutils/zfsnap e.g. And what it allows you to do is to have, let´s say, a month back of snapshots on your primary system, and then provision your backup system bigger to handle snapshots that spans back years in time. Other utilities I´ve seen, like in FreeNAS, only does 1:1, so have to have as much snapshots on your primary machine as you have on your secondary, which isn´t all that practical, IMHO.

/Sebulon
 
Sebulon said:
@luckylinux

There´s a big difference in copying data file by file compared to sending the data as a stream. The biggest overhead that rsync has (which it mitigates very efficiently though) is that it takes one file, copies one file over the net, takes another, over and over. With ZFS, it´s just one very big stream of data. Both use ssh for the transport, both can do "incremental" backups, but ZFS can also send along the dataset´s properties, like compression, quota, sharing and so on.

Do not enable dedup! For the love of god, do not go there! It really isn´t worth it, it really, really isn´t. I´m stating myself as a cautionary example:
https://forums.freebsd.org/showpost.php?p=206859&postcount=10

/Sebulon
I read your post. Sorry about your data. Dedub not only is resource-hungry but it also can lead to data loss? That's SCARY. So even 64GB of RAM isn't enough? Well ... that's a whole lot. I can put it more but I'd rather use if for virtualization with Virtualbox for instance.

I can see the use of your script then. When the backup server will be full I'll just add another say 6 disk RAID-Z2 to it and get plenty of free space to store new snapshots, which is still (a lot better) than dedup, right? This without needing to adding space to the NAS itself (if not creating a bunch of big files).
 
luckylinux said:
When the backup server will be full I'll just add another say 6 disk RAID-Z2 to it and get plenty of free space to store new snapshots, which is still (a lot better) than dedup, right? This without needing to adding space to the NAS itself (if not creating a bunch of big files).

Exactly:)

/Sebulon
 
Just another question: if on 1st January 2013 I had a big file X and on 1 February 2013 I renamed it to Y, the space occupied by the snapshots would be size(X) + size(Y) (plus all other files of cours), unless dedup was on, right?

If you rename a big file you will only actually change the metadata. The record on disk containing the filename will be updated and re-written somewhere else. The old copy of this metadata will be kept by the snapshot (so you see the old filename when viewing the snapshot). You will not end up with two copies of the data, regardless of dedupe. Even if you actually modified the file slightly, only the on-disk records (usually 128k each) that were modified would end up being 'duplicated'.

The only way I can see you ending up with two copies of a file is to actually copy it with cp. At that point you will have two completely independent copies of the same data. If you delete one copy, but have at least one snapshot, the deleted copy will be retained on disk. Dedupe would most likely save you space in this instance but I wouldn't bother with it personally.

You may want to read up on how snapshots work in ZFS and understand the basic concepts. Due to ZFS being permanently copy-on-write, snapshots are fairly straight forward.

Edit: Just to add, not that I recommend dedupe in any way, especially on live data (backups maybe), but if you 'zfs send' a file system from a deduped pool, the copy on the receiving system will not be deduped unless you have dedupe enabled on the receiving pool.
 
Thank you for your reply usdmatt. Good to know that renaming a big file (say a few GB) only affects the metadata and therefore new snapshots won't get much bigger, since only the metadata has been changed (the data itself is already contained in the old snapshot).

usdmatt said:
Edit: Just to add, not that I recommend dedupe in any way, especially on live data (backups maybe), but if you 'zfs send' a file system from a deduped pool, the copy on the receiving system will not be deduped unless you have dedupe enabled on the receiving pool.

Yeah. Seeing Sebulon's post and the catastrophic results it lead to it definitively isn't worth it to save a few hundreds of GB out of tens of TB. I won't enable dedub, neither on the NAS nor on the backup server.

Edit: I always though it was obvious, but snapshots are incremental (only differences between the last snapshot are saved), right?
 
luckylinux said:
I always though it was obvious, but snapshots are incremental (only differences between the last snapshot are saved), right?

Correct. However when you start using them for backups, if you send the contents of one snapshot to a backup, and then sends another one without specifying how you want only the delta to be sent, it will send the entire stream again. Thankfully, my tool takes care of that automagically:)

/Sebulon
 
Sebulon said:
Correct. However when you start using them for backups, if you send the contents of one snapshot to a backup, and then sends another one without specifying how you want only the delta to be sent, it will send the entire stream again. Thankfully, my tool takes care of that automagically:)

/Sebulon
Good call. I thought it was incremental per default even when sending them. I would've filled the disk pretty fast since this isn't the case. I'll have to take a good look at your script, but first have to finish getting all parts for the NAS ;)
 
If sending snapshots to a remote system manually, the correct procedure is to send a full copy then incremental, eg.

Code:
# zfs snapshot my/data@one
# zfs send my/data@one | ssh user@backup zfs recv backup/my/data
... later on ...
# zfs snapshot my/data@two
# zfs send -i one my/data@two | ssh user@backup zfs recv backup/my/data

As far as I'm aware, if you try and send a 'full' backup again, it won't let you, as a 'full' send will only succeed if the backup file system doesn't already exist.
 
luckylinux said:
I would've filled the disk pretty fast since this isn't the case.

No, no, it would´t. If you had a dataset containing 2TB, send that to a backup, continue filling up more data to 3TB, and then send that over again, it would overwrite the previous data, plus add the new, "only" taking up 3TB on the backup. But it would send the entire 3TB datastream. If you had sent that second backup incrementally, it would "only" have to send 1TB to update previous 2TB that was already there.

/Sebulon
 
Sebulon said:
No, no, it would´t. If you had a dataset containing 2TB, send that to a backup, continue filling up more data to 3TB, and then send that over again, it would overwrite the previous data, plus add the new, "only" taking up 3TB on the backup. But it would send the entire 3TB datastream. If you had sent that second backup incrementally, it would "only" have to send 1TB to update previous 2TB that was already there.

/Sebulon

So the issue about transfering to a remote server is "only" about bandwidth and not disk space used on the remote host.
If sent incrementally very small bandwidth is needed and the backup is stored incrementally on the remote host.
If sent full very large bandwidth (and time) is needed but the backup is stored incrementally anyway on the remote host, which makes the full transfer useless.

Was that what you meant, Sebulon?
Thank you.
 
Back
Top