ZFS Setting up incremental zfs backups to external hard drive

Hi,

I have an external hard drive connected to my NAS, and I am getting nightly backups of some datasets into this drive. There are daily snapshots taken from these datasets, which revolve on a weekly fashion; when taking a snapshot today, the snapshot from 7-days-ago gets destroyed. Therefore, each dataset has a 7-day-long snapshot history at any given time. This is the command I use for backing up:
zfs send -RI pool0/media@yesterday pool0/media@today | zfs receive -Fu external/backup/pool0_media
My problem is that zpool list shows about 81% of the external drive allocated, whereas I feel like it should be less than that based on the backed up actual dataset sizes. I guess one reason for that discrepancy may be the child snapshots under each dataset. Is there a more space-saving strategy that can be used for this purpose? I don't want to run out of external drive space? Am I supposed to use -Ri instead of -RI?

My confusion is partly because of the discrepancy between zpool list and df -h outputs. The former shows the external drive pool at 81% capacity with 5.92 TB allocated, whereas the latter shows less than 2.5 TB allocated in that drive.
 
You could increase compression level and enable dedup on your external drive. That'll only work on all new writes, so you'd have to copy it to a separate dataset to see how it'd change space usage.
 
You could increase compression level and enable dedup on your external drive. That'll only work on all new writes, so you'd have to copy it to a separate dataset to see how it'd change space usage.
I didn't simulate increasing the compression level but enabling dedup seems to give me a dedup * compress / copies = 1.14 ratio, so I decided against it, since it wouldn't be worth.

I am hoping that the allocation proportion will eventually stabilize and stop increasing once a week worth of snapshot backups are reached. I hope that the external drive doesn't run out of space by then.

I wanted to take advantage of some zfs features by this strategy, but it appears that I either need to use higher compression level, or use the tried and proven method of using rsync instead. I still feel like I am probably missing something but wasting more than twice the space of the actual dataset size for backup doesn't make any sense.

Similar to my previous question above there is a huge discrepancy between these two outputs regarding the source dataset (not backup):
zfs list
pool0/media 6.83T

du -sh /pool0/media
2.2T

Is this because of the snapshots? Which one is more accurate?

Moreover, why did the dataset grow about 1 TB since yesterday? I am sure I didn't copy much of anything there that would justify this change (5.92 to 6.83 TB) in the last 24 hours.
 
Back
Top