ZFS snapshot backups

hi all,

question regarding zfs send / backing up snapshots..

for sake of extreme suppose you have a 100tb pool and you do a weekly snapshot and zfsend to an off site server

does the remote server need to be 100tb? or does it only have to be as large as the weekly snapshot changes? or can it be as small as the actual snapshot files in .zfs/snapshot

ie if the average weekly balance of changes is 5tb and then one week its 50tb .. is a 25tb remote server going to be able to handle the backup?

thanks
 
thanks, one more stupid question ..

A year ago .. you set up a new production pool .. you immediately make a full backup.. over the last year the pool doubles in size .. each month you take a new snapshot..

if today every drive in the production pool combust into flames .. and you need to completely rebuild the pool on new hardware ... you will need ...

A: the original backup & ALL 12 snapshots
B: the original backup & only the last snapshot
C: just the last snapshot

guessing B , but its not really clear if snapshots contain all previous changes .. or just the changes between the previous snapshot and the current one..

tia
 
I'd suggest doing a bit of reading and understanding of snapshots if you are actually doing this in production.

A snapshot doesn't contain anything really. It just marks a specific point in time and knows what data was on the dataset at that point.

If you send a snapshot, by default it contains the entire contents of the dataset at that point in time. You could restore just that single snapshot to a new pool. This would be a pretty bad way to back up a machine though as you'd be sending the entire dataset every time.

Normally what you would do is the following -
  1. Take a snapshot of the main pool and send it to the backup machine
    If you have 10TB of data, this will send 10TB of data
    zfs send dataset@snapshot1
  2. Take a second snapshot and send just the differences since the first snapshot.
    If only 10GB of data has changed, this will only send 10GB of data
    zfs send -i snapshot1 dataset@snapshot2
  3. Continue to do this, always sending the differences to the previous snapshot.
Now, if you're storing all these snapshots somewhere in files, snapshot1 will contain most of the data. snapshot2 will contain just the changes since snapshot1. snapshot3 will contain just the changes since snapshot2, and so on. In order to restore from this you would need to receive from each snapshot in turn. This would be a lot of hassle.

What most people actually do is to send the original snapshot to another ZFS pool. I tend to set up ssh keys between my live and backup system, then effectively just do something like the following to backup -

zfs send -i last_snapshot pool/my/dataset@new_snapshot | ssh backup_host zfs recv backup/my/dataset

If you use ZFS on the backup machine if gives you several benefits -
  • You know the data has been sent intact because the recv would fail otherwise.
  • ZFS is now protecting the data on the backup system, and provides checksums so you know the backup data is 100% bit for bit correct.
  • On the backup system you can not only access the data immediately, but can also look in any snapshot. (Obviously it's up to you to decide how many snapshots you keep on the live/backup system. You just need to make sure you keep the last "sent" snapshot on both systems so you can do the next incremental send
  • You can immediately restore the entire dataset from any snapshot by just doing -
    zfs send backup/my/dataset@snapshot_i_want_to_restore | wherever_you_want_to_send_to
Personally I would always suggest against storing ZFS backups on anything other than another pool. You don't know the state of the data and restoring is hours or days of hoping you don't get a recv error.
 
Also: the zfs send command basically sends out a huge datastream. You could easily pipe that through something like gzip and then store it as a file. This would by definition take up (somewhat) less storage space than the original amount of data.
 
thank you that helps a lot .. I will definitely be using 2 FBSD12 machines with identical hardware and zpools.

cheers
 
Back
Top