Other Question for rsnapshot users, rsync experts

mefizto · Mar 21, 2024

Greetings all,

some recent backup related discussion made me look at my backup strategy, and I was wondering about two issues/features of rsnapshot(1).

As I understand it, rsnapshot(1) is using hard-links to save space. Is there not a danger of running out of inodes?

The second question is about rsync(1) that rsnapshot(1) uses internally. For synchronization/backing-up Windows10 host, I use Free File Sync. The procedure is separated into two steps, first a comparison between the source and target sets is made and a summary, showing the difference is displayed and what is synchronized/backed-up may be changed. I find it extremely useful because sometimes I (intentionally) delete a file from the work set, and the summary show that it is still present on the target set. This enables me to change my mind, which actually several times happened. Once the summary is approved/changed, the synchronization/backup is carried out.

I understand that rsnapshot(1) keeps the file untill all the hard links are unlinked, which depends on the number of snapshots. I was, however, wondering if some of the options in rsync(1) could be used to either give a warning that all hard-links are to be unlinked or in such an event make and mark a copy of the file. I could then periodiacally run a script looking for such makrked copies and made a decision whether to keep them of dlete them.

Kindest regards,

M

cracauer@ · Mar 21, 2024

1) well, you use more inodes due to having one complete tree per backup. But still the same number as having full trees. I don't think you are likely to run out of inodes.

2) rsync, when deleting things, doesn't care about hardlink-ishness. It just unlinks inodes as required. What you could do is a dry run with --delete after your full run. That would tell you what would be removed.

Eric A. Borisch · Mar 21, 2024

Assuming a snapshot (of the current backup) is always created before updating, then files are finally deleted/removed from disk with the deletion of a snapshot, not during the backup (working directory update, including deletions in the working directory) itself.

Deleting a snapshot is the only point where the link count for a file can drop to 0 in such a design — the preparing-for-backup snapshot will increment by 1 the refcount of all existing (links>=1) files in the snapshot source, so >=2 after snapshot creation; subsequently deleting the file in the working directory can only drop it back to 1.

If that’s the case (I don’t use this particular tool) something like find -x <path to snapshot> -links 1 will show you what files will be actually deleted by the removal of the interrogated snapshot. (Finds the files that are linked to only from the specified tree/path).

mefizto · Mar 21, 2024

Hi cracauer@,

thank you for the reply.

Re 1) thank you for the explanation.

Re 2) I may not understand (or use) the terminology correctly, but I thing that we talk about the same thing.

Hi Eric A. Borisch,

I do not quite follow your explanation about the link. My understanding of the algorithm, and it may be incorrect, from the best source I was able to find, cf., http://www.mikerubel.org/computers/rsync_snapshots/ is as follows:

When the rsnapshot(1) is invoked, the following sequence happens:

(i) the backup.3 is deleted (if it exists) - the top-most step. This causes all files present only in the backup.3 to be lost. All files that didn't change afterwards appear in backup.2 as they are hard-linked in the step cp -al backup.(X+1)/. backup.x.

(ii) all backup.X are rotated up, e.g., to backup .(X+1).

(iii) a new (snapshot?) backup.0 will be created. All the files that didn't change are hard-linked.

Thus, considering that the initial backup.0 contains all the files. One deletes
file_0. Upon the next invocation of rsnapshot(1), the backup.0 is promoted to backup.1 and new backup.0 is taken, now without the file_0. Thus there is no hard-link created by the step cp -al backup.1/. backup.0.

Consequently, when the original backup.0 is promoted to backup.3 and is removed in the next invocation of [rsnapshot(1), the file_0 is lost forever.

Now, if I deleted the file unintentionally, I will likely realize the mistake, and can retrieve the file. However, if I deleted it intentionally and later decided that I want the file, it is lost forever. Hence my opening post.

Kindest regards,

M

gpw928 · Mar 21, 2024

With rsnapshot it's usual to set aside backups on a regular basis. For instance I have an annual set of backups for all my clients going back more than a decade, and a monthly set of backups kept for several years.

This won't completely solve your issue of "losing files you have deleted" because short-lived files may not be represented in any backup.

It's a simple fact of life that file deletion on a Unix system means just that! [Some people use frequent snapshots to guard against finger trouble, but that's outside the scope of your question.]

Eric A. Borisch has a sound solution above for you to review the candidates for deletion from the backups. When rsnapshot reaches its configured retention limit (the number of backups to keep), the oldest backup tree will be deleted before the next backup commences. Just look for files in the oldest tree with one link -- because they will disappear when the tree is deleted.

mefizto · Mar 21, 2024

Hi gpw928,

thank you for the reply.

gpw928 said:
With rsnapshot it's usual to set aside backups on a regular basis. For instance I have an annual set of backups for all my clients going back more than a decade, and a monthly set of backups kept for several years.

This is, in fact, what I have been contemplating. Let us say that I will set a day folder for the daily backups to the number of days for each month (if the application lets me.) Then before the last backup "falls off the cliff" I save it into a month folder and start all over. Same for years.

This would prevent the file loss, but as there is no free lunch, I would increase the size of the backups.

The alternative I was thinking about is doing a whole year, but I was afraid of the inode limit.

Since you have been doing this for a while, could you advise on your strategy?

Kindest regards,

M

gpw928 · Mar 22, 2024

Without knowing your usage case, it's difficult to make specific recommendations -- inode turn-over depends greatly on file system activity.

Unfortunately, my office, including my rsnapshot (ZFS) server, is packed up in cardboard boxes at the moment. Observations below are how it used to be (and hopefully will be again at some time).

I have rsnapshot backups configured for three FreeBSD systems, five Debian systems, and two Raspberry PIs (Raspbian). The number of systems has not changed a lot over the years, but there have been lots of updates, and all of the systems except the Raspberry PIs have been replaced or completely rebuilt, some many times.

The backup clients are not huge. They get all the operating system and home directories backed up with rsnapshot -- maybe an average of 20GB on each of 10 clients -- and perhaps an average of one million inodes backed up from each client. I'm struggling to recall how much space all the rsnapshot backups took up on the ZFS server, but best recollection is that the entire 10 year set was something not too far above 1TB.

Few of my clients hold much application data. Nor do they have actively changing file systems (i.e. on a daily basis). I built a ZFS server to do that. The ZFS server uses rsnapshot to backup its own operating system and home directories, but the databases, virtual machine images, media files, iSCSI volumes, and any other bulk storage (including the rsnapshot backups themselves) on the ZFS server are excluded from rsnapshot backups.

I have never had to consider inode usage as an issue. The rsnapshot backups run every month, or two. I keep a working set of the 20 most recent backup trees. Each tree contains a view of the root of each client. Every now and then I move one backup tree to the side to keep "for ever", and re-number the tree heads to restore the consecutive numbering. I have a total of about 30 backup trees stored. This may not suit your usage which is more complex than mine, but the rsnapshot application gives you lots of options for different automated retention and rotation schemes.

My ZFS server has a separate 100% backup mechanism to backup everything, including the rsnapshots (ZFS send to detachable media which are rotated off-site).

I'm very pleased to be able to have all my rsnapshot backups for a decade available on-line (as well as securely off-site).

mefizto · Mar 22, 2024

Hi gpw928,

thank you for your reply.

gpw928 said:
Without knowing your usage case, it's difficult to make specific recommendations -- inode turn-over depends greatly on file system activity.

You are absolutely correct, but I am more interested in general approach that people take, namely with data.

In addition to the above, I am trying to figure out how to deal with (i) renamed files and (ii) moved files. In order to deal with the amount of data, and looking at the pattern, I had the "brilliant" idea of divided my data into (i) archive, which contains data changing with frequency approaching the speed of movement of glaciers, (ii) active data, project that I work on, and (ii) critical data, which are akin to (i), but which I would hate to lose.

This turned out to not be the brilliant strategy. Let us say that I rekindle my interest in an old hobby, and I move it from archive to active, and optionally (having a fit of being organized) rename the folders/files. Not only have I confused the backup system, but eventually myself because I panic, not finding the folder at its familiar place.

Hence I started to look at rsync(1) and rsnapshot(1), the idea being that I can give it a list of folders/files to back-up at different schedules onto a flat target structure. Furthermore, if I rename something, I could go the the previous backups and do find and rename.

That is a theory, but I am not quite sure whether this is feasible and not overly complex.

Kindest regards,

M

cracauer@ · Mar 22, 2024

mefizto said:
You are absolutely correct, but I am more interested in general approach that people take, namely with data.

My backup is all ZFS and its snapshots by now.

rsnapshot is a fine backuo mechanism, but you will chase those edge cases forever.

If you have enough RAM to do dedup you are all set with wild renaming actions on the source side.

mefizto · Mar 23, 2024

Hi cracauer@,

thank you for the reply.

cracauer@ said:
rsnapshot is a fine backuo mechanism, but you will chase those edge cases forever.

Ha, ha, that is what I suggested as overall complexity.

cracauer@ said:
If you have enough RAM to do dedup you are all set with wild renaming actions on the source side.

Apart from not having enough memory vis-a-vis the amount of data, I do not understand how de-duplication help. After all the old(er) snapshots will have the previous names/positions. Or, what concept am I missing?

Kindest regards,

M

cracauer@ · Mar 23, 2024

mefizto said:
Apart from not having enough memory vis-a-vis the amount of data, I do not understand how de-duplication help. After all the old(er) snapshots will have the previous names/positions. Or, what concept am I missing?

If you feed your backup, which is on ZFS, with rsync, then renamed files that are not identified with --fuzzy will allocate separate space even if the contents are identical.

mefizto · Mar 23, 2024

Hi cracauer@,

cracauer@ said:
If you feed your backup, which is on ZFS, with rsync, then renamed files that are not identified with --fuzzy will allocate separate space even if the contents are identical.

Yes, I understand that, I was referring to your de-duplication suggestion.

Kindest regards,

M

cracauer@ · Mar 23, 2024

mefizto said:
Hi cracauer@,

Yes, I understand that, I was referring to your de-duplication suggestion.

Right. If you have dedup on your ZFS backup you can rename everything as much as you want and not allocate a new copy.