ZFS Incremental snapshot confusion

mefizto · Mar 1, 2020

Greetings all,

I have been reading Lucas'/Judes' chapter on replication and as a result I am confused. Aside of the fact that the terms backup and replication are used interchangeably, which are different concepts, the bigger issue is the use of incremental snapshots. If I understand correctly, which may be a wrong assumption, the first snapshot taken at the source host is transferred to the target host. The transferred snapshot is a full-sized replica of the data-set on the source host. Next time a snapshot is taken on the source host and in and incremental transfer only the blocks that have changed between the first and second snapshots are transferred to the target host. Let us say that the time of taking the snapshot is daily and one transfers 31 snapshots.

The chapter mentions discarding the older snapshots. I do not understand how this is possible. Since the snapshots are interdependent based on the incremental mechanism, if some (older) snapshots are discarded, the data-set cannot be rebuilt. By means of an example, let us say that day fifteenth snapshot has been corrupted. No restoration of days 16-31 is possible.

It also follows that the first snapshot must be kept, which implies that the size of the backup as well as the danger of corruption grows. So does on in periodic intervals, e.g., a year archives the previous year and starts over?

Is my understanding correct or am I missing something?

Kindest regards,

m

unitrunker · Mar 1, 2020

Chapter 20. The Z File System (ZFS)

The Z File System, or ZFS, is an advanced file system designed to overcome many of the major problems found in previous designs

www.freebsd.org

19.4.7.1. Incremental Backups

mefizto · Mar 1, 2020

Hi unitrunker,

how does it answer my questions?

Kindest regards,

M

ShelLuser · Mar 1, 2020

mefizto said:
The chapter mentions discarding the older snapshots. I do not understand how this is possible. Since the snapshots are interdependent based on the incremental mechanism,

Actually they're not. Think of snapshots as their name implies: a snapshot of the filesystem created at that time. They don't depend on previous snapshots at all, those are only used when you're creating an incremental backup stream: then both snapshots are compared and the differences get dumped.

mefizto said:
if some (older) snapshots are discarded, the data-set cannot be rebuilt.

ZFS doesn't use traditional ways of data 'snapshotting'.

unitrunker said:
Chapter 20. The Z File System (ZFS)

The Z File System, or ZFS, is an advanced file system designed to overcome many of the major problems found in previous designs

www.freebsd.org

19.4.7.1. Incremental Backups

That's not helping IMO, notice how the manual doesn't mention anything about destroying snapshots? And considering the OP's confusing...

unitrunker · Mar 1, 2020

ShelLuser said:
not helping IMO, notice how the manual doesn't mention anything about destroying snapshots? And considering the OP's confusing...

I thought it clear but not so.

Say you zfs send a snapshot "A" to a pool on a different host. Later you make snapshot "B" and send it over. Subsequent zfs sends need only push the deltas since the last send. The receiving pool combines them. The handbook shows how to issue a zfs send of snapshot B that excludes redundant data from earlier snapshot A.

Once the snapshot is sent - you *could* destroy it on the originating host. As a matter of policy it might be best to keep at least the latest snapshop so as to use it as a baseline for the next snapshot+zfs send.

The manual explains how to destroy a snapshot. It is two sections down.

mefizto · Mar 2, 2020

Hi ShellUser,

thank you for your attempt on explanation; unfortunately, I am not smart enough to understand it. I do understand the definition of a snapshot as a capture of a filesystem at a time the snapshot is created. So, if I were to send a snapshot every time, there would be no problem, well apart of bandwidth and storage waste.

As I wrote, and you seem to agree, the inter-dependency is introduced at the time the incremental snapshot is created. So, how can one delete one of the incremental snapshot and still restore the data, is what I cannot understand.

ShelLuser said:
ZFS doesn't use traditional ways of data 'snapshotting'.

Yes, indeed again, the chapter is actually going quite into the mechanism of the snapshot creation, and I have a false impression that I understand it, at least on conceptual level. However, this still does not explain how the inter-dependency of incremental snapshots can be circumvented.

I have been trying to find any further references, but some of the responses are contradictory (after all it is Internet) - "yes, you can delete them", "no, you cannot delete them", without giving any support for either position.

Kindest regards,

M

mefizto · Mar 2, 2020

Hi unitrunker,

unitrunker said:
Say you zfs send a snapshot "A" to a pool on a different host. Later you make snapshot "B" and send it over. Subsequent zfs sends need only push the deltas since the last send. The receiving pool combines them. The handbook shows how to issue a zfs send of snapshot B that excludes redundant data from earlier snapshot A.

Just to make sure that I understand. Referring to section 19.4.7 and 19.4.7.1. Snapshot "A" is the mypool@backup1, which is now restored(?) into backup. Then a snapshot "B" mypool@replica1 is taken and only the difference is sent to the mypool and is combined with the backup1 in the backup and also stored as mypool@replica1? It does not appear to be the case because the size of the backup has not changed, cf. the last two tables in section 19.4.7.

Kindest regards,

M

unitrunker · Mar 2, 2020

mefizto said:
It does not appear to be the case because the size of the backup has not changed, cf. the last two tables in section 19.4.7.

The backup pool looks bigger to me after the second snapshot is added. Look at the ALLOC and CAP columns.

Before second snapshot:

Code:

# zpool list
NAME    SIZE  ALLOC   FREE   CKPOINT  EXPANDSZ   FRAG   CAP  DEDUP  HEALTH  ALTROOT
backup  960M  61.7M   898M         -         -     0%    6%  1.00x  ONLINE  -
mypool  960M  50.2M   910M         -         -     0%    5%  1.00x  ONLINE  -

After second snapshot:

Code:

# zpool list
NAME    SIZE  ALLOC   FREE   CKPOINT  EXPANDSZ   FRAG  CAP  DEDUP  HEALTH  ALTROOT
backup  960M  80.8M   879M         -         -     0%   8%  1.00x  ONLINE  -
mypool  960M  50.2M   910M         -         -     0%   5%  1.00x  ONLINE  -

I read this as 6% before and 8% after. Now, if you literally mean the snapshot "replica1", no. You can clearly see they are separate:

Code:

# zfs list -t snapshot
NAME                                         USED  AVAIL  REFER  MOUNTPOINT
backup/mypool@replica1                       104K      -  50.2M  -
backup/mypool@replica2                          0      -  55.2M  -

Both replicas are present. Let me throw a question at you that - as an exercise - may help you. What do you think will happens when this command is issued?

zfs destroy backup@replica1

mefizto · Mar 2, 2020

Hi unitrunker,

thank you, I will now try to experiment based no your explanation.

Kindest regards,

M

ShelLuser · Mar 2, 2020

unitrunker said:
Say you zfs send a snapshot "A" to a pool on a different host. Later you make snapshot "B" and send it over. Subsequent zfs sends need only push the deltas since the last send. The receiving pool combines them.

And this is why you don't RTFM... why assume you're sending to a receiving pool? I always store my backups into files, which has plenty of advantages.

Anyway, not out for discussions here, only trying to help the OP.

unitrunker · Mar 2, 2020

ShelLuser said:
Ayway, not out for discussions here, only trying to help the OP.

Everything asked was on that page. I could paraphrase the whole thing or point to it. The OP read the page and then asked for more clarification. It looks like a win to me.