ZFS Size difference of backed up directory

Hello!

I just used rsync to backup some dirs...
I have noticed a significant difference in the size of the original dir and it's backup.
Specific case is:
Directory: /boot
Source system in disk with freebds-zfs
du -h: 70M
Destination system in disk with freebsd-ufs
du -h: 122M

rsync command used was: "rsync -azP source_dir destination_dir"
Wow!! Almost the double!
Is it normal? I have to take this difference in consideration, otherwise how can I calculate disk space for the backups...?

Thanks?
 
It depends on many aspects, but it's not necessarily weird or wrong. For example: the ZFS filesystem could be compressed which would seriously reduce the required diskspace. But the block size of the UFS filesystem could also be much different from that of the ZFS filesystem, thus resulting in smaller files taking up more physical diskspace.

This is one of the reasons why it's usually better to use a single file as backup. For example a tar archive or a file system dump.
 
So what happens is that the data *grew* from 70 to 120M during backup, from ZFS to UFS.

A: Are you running compression or dedup on ZFS? If yes, the ZFS version will be smaller, because it is stored more efficiently.

B: Are any of the source files sparse? Let me explain what a sparse file is: In Unix you can skip whole regions of a file when writing; those regions do not have space allocated, but are part of the file's address range. When reading the file, those regions will read as if a continuous stream of zero had been written. When copying those files, they will often (depending on the copy program) use more space, because the unwritten areas are written to the target file.

To determine the real root cause, try the following: for each file in source and target, compare three things: The size of the file (you can get that from ls, or from stat), and the disk usage on source and target file systems. For simplicity, measure all three numbers in consistent block sizes; I would recommend kilobytes (divide the size by 1024, and use the "-k" option on du). If compression is on, then many files will use slightly less space than their size. If files are sparse, then they will use much less space than their size.

If you have sparse files, you can run rsync with the "--sparse" option; that uses heuristics to try to guess where the sparse (unwritten) areas are and skip them. It doesn't always work; the implementation of the sparse option is complex. It also makes rsync run more slowly.

C: Finally, do you have hard links in the source file system? That would be multiple names for the same file. To find them, make a directory listing of the source file system with the "-i" option (it shows inode numbers), and see whether any inode numbers are duplicated. If yes, the source will have multiple file names for the same file on disk, but after the copy, you will have multiple copies of the file, using more space. If this is the case, you can use the "--hard-link" option on rsync (which makes it run more slowly).
 
Back
Top