ZFS + snaphosts: disk usage?

Hello,

Disk space handling for snapshots in ZFS is a mystery for me. I want to use ZFS snapshots for a regular backup of another file systems. Files are copied over once a week and "labeled" with a snapshot. The idea was that files that have not changed (since the previous backup) will not occupy disk space. However this does not seem to be the case.

To narrow down the issue this was the experiment:

1. Prepare a tar file with some content (few hundred MBs)
2. Untar it on a new ZFS file system
3. Make a snapshot
4. Repeat step 2 and 3 again and again (basically, existing files are overwritten with the same files)

After a while:

Code:
Write failed: No space left on device: No space left on device
I expected that files with the same content do not occupy disk space every time a snapshot is taken ... or?


Cheers,
B.
 
sbremal said:
1. Prepare a tar file with some content (few hundred MBs)
2. Untar it on a new ZFS file system
3. Make a snapshot
4. Repeat step 2 and 3 again and again (basically, existing files are overwritten with the same files)

If you copy the data every time on a new ZFS you dont need to wonder about the "lost" diskspace.
Create the ZFS once and make a snapshot every time after the data is copied. Then the "copy on write" feature of ZFS will work.

You need to make sure that the files that you like to copy are not rotated like for example the logs in /var/log/. Otherwise every file version will be different from the last version and you got a "huge" data overhead.
 
Sorry, maybe it was misleading, the ZFS file system is created only once:

0. Create a new ZFS file system
1. Prepare a tar file with some content (few hundred MBs)
2. Untar it on the ZFS file system
3. Make a snapshot
4. Repeat step 2 and 3 again and again (basically, existing files are overwritten with the same files)

The files from each snapshot are the same (checked with diff -r), however each snapshot takes its own disk space.
 
Snapshots work at the block level, they have no concept of files. The disk blocks in a snapshot are locked for as long as the snapshot exixts and cannot be altered.

When you make a new copy, only new blocks will be allocated and they in turn will be locked with the next snapshot.

So what you're seeing is perfectly normal. Filesystem usage will grow with the amount you copy each time.
 
Hi,

the reason is that despite the fact that most of the files in the TAR are unchanged from the last time you created the TAR, you ARE writing all of those files on the ZFS file system. If you want to take advantage of ZFS and snapshots in the way you are thinking you will need to use something like rsync, that actually only updates changed blocks on the ZFS file system,

thanks Andy.
 
AndyUKG said:
Hi,
If you want to take advantage of ZFS and snapshots in the way you are thinking you will need to use something like rsync, that actually only updates changed blocks on the ZFS file system,

Technically it will update all blocks belonging to files that has changed.
But yeah, it will work a lot better than untarring full copies.
 
jalla said:
Technically it will update all blocks belonging to files that has changed..

Ah ok. So in the case the original poster has a few very large files then this still isn't going to be a very good solution. On the other hand, if they have a large number of files, of which only a few change from backup to backup, then rsync would probably be a good fit.

ta Andy.
 
AndyUKG said:
Ah ok. So in the case the original poster has a few very large files then this still isn't going to be a very good solution. On the other hand, if they have a large number of files, of which only a few change from backup to backup, then rsync would probably be a good fit.

ta Andy.
Rsync is still about the only choice you have. And even with a large file that changes, I would think that it would depend on where in the file it changes. e.g. if only the tail end changes, then the delta should be small.

I wonder if dedup would help things. Does it work across snapshots? If it does, the tarring might work just as well once you start using v28.
 
carlton_draught said:
if only the tail end changes, then the delta should be small.

That was my original thinking, but I had a look at the documentation on the rsync site and, if I understand it correctly, when it does a block update it actually creates a complete new copy of the original file which when completed is renamed and replaces the original file (block changes only are sent over the network). Hence it will update all blocks, which is bad if you are trying to take advantage of ZFS snapshots.

cheers Andy.
 
AndyUKG said:
That was my original thinking, but I had a look at the documentation on the rsync site and, if I understand it correctly, when it does a block update it actually creates a complete new copy of the original file which when completed is renamed and replaces the original file (block changes only are sent over the network). Hence it will update all blocks, which is bad if you are trying to take advantage of ZFS snapshots.
Ah. I'll keep that in mind. One thing that's kind of related - about the biggest thing I'll be storing in the future is VM images, and I'm hoping dedup is going to be effective in keeping those sizes down somehow. Otherwise there is always the option to snapshot it from within the VM, but being able to do it via ZFS has its advantages in that it can be automated.
 
Little update to this I just saw being mentioned on another forum.

Rsync has an option "--inplace" which allows you to update the original file that should only update changed blocks on the destination. Ideal for using in conjunction with ZFS, but with the draw back that interruptions to the rsync update can corrupt the file on the receiving side. Still an option worth considering for backup purposes...

Andy.
 
This is something I've been pondering lately too. Same situation here - I need to backup data from a number of linux hosts to a ZFS pool, making regular snapshots as I go. Is rsync our best option for eliminating unnecessary duplication? Any other backup systems ZFS users are using?
 
Back
Top