Solved Backups network synchronization(?)

Greetings all,

due to my disorganized past, I have two servers with two different versions of backups: backup_reference and backup_old. I would like to synchronize them with the following rules:
(i) matching files are deleted only from the backup_old and (ii) non-matching files are left on the respective servers.

I had found an excellent tool searchmyfiles by NirSoft, which does exactly that - it finds the matching files, enables sorting them by location, selecting the desired location, and deleting the selected files. Unfortunately, it is Windows only, does not work over network, and as documented elsewhere on the forum (i) when I mount the server on the Windows machine under NFS I cannot modify the directories/files, and (ii) trying SMB, I have the dreaded "The network path name was not found", with which I have been unsuccessfully fighting for past few days.

I was initially thinking about rsync(1), but either it cannot be done or I am not smart enough to figure it out.

Any help would be appreciated.

Kindest regards,

M
 
I believe rsync(1) is the right tool for the job, and if I were to summarize what you're doing I'd say "diff two directories using rsync". Here are a couple of promising links from searching for those terms:

Make sure to do a dry-run (-n or --dry-run) before running any potentially damaging commands. The -i (--itemize-changes) switch looks interesting too.
 
I had a similar task and I had written a simple script for automated deletion of duplicates.
This script is not friendly for working with remote server over internet, but you can improve it.
Replace a line which comparing files (diff -q) to something like 'md5 over remote ssh connection with passwordless keys' and compare locally and remotely generated checksums.

Tune it if you need it, and run finddupe.sh
If your servers are in the same local network then you can mount remote directory and use my script "as is".

/home/im/finddupe.sh
Code:
#!/bin/sh
dir1='/home/im/pic2/pics'
dir2='/home/im/tmp/data/pic1.orig'

cd /home/im/pic2/pics && find ./ -type f -exec /home/im/run.sh {} \;

/home/im/run.sh
Code:
#!/bin/sh

dir1='/home/im/pic2/pics/'
dir2='/home/im/tmp/data/pic1.orig/'

echo ""
#echo $1
echo "${dir1}$1" "${dir2}$1"
diff -q "${dir1}$1" "${dir2}$1" && rm "${dir1}$1" && echo deleted_1st
#sleep 1

Code:
if sha256 -q -c `cat mynetlist.sha256` mynetlist.txt
then
#do something
done
 
Also you can use rsync in unusual way:
rsync -rv -n -c root@backupold.server:/tmp/backup_old/ /tmp/backup_reference/ >savelist
It will save the list of files that are NOT the same to the file named "savelist".
After that you can write a little script which will delete in the directory /tmp/backup_old/ all not listed files
 
Greetings all,

due to my disorganized past, I have two servers with two different versions of backups: backup_reference and backup_old. I would like to synchronize them with the following rules:
(i) matching files are deleted only from the backup_old and (ii) non-matching files are left on the respective servers.

I had found an excellent tool searchmyfiles by NirSoft, which does exactly that - it finds the matching files, enables sorting them by location, selecting the desired location, and deleting the selected files. Unfortunately, it is Windows only, does not work over network, and as documented elsewhere on the forum (i) when I mount the server on the Windows machine under NFS I cannot modify the directories/files, and (ii) trying SMB, I have the dreaded "The network path name was not found", with which I have been unsuccessfully fighting for past few days.

I was initially thinking about rsync(1), but either it cannot be done or I am not smart enough to figure it out.

Any help would be appreciated.

Kindest regards,

M

The following should ad-here your rules (i) and (ii), and keep it simple:


# host: backup_reference
# enter backup_reference_dir
cd backup_reference_dir
# create checksums
find . -type f -exec gmd5sum {} \; | tee -a ../backup_reference.md5

# copy checksum file to backup_old
scp -rp backup_reference.md5 backup_old:/usr/home/username

# host: backup_old

# enter backup_old_dir
cd backup_old_dir

# verify files and keep the ones which match for later to remove
gmd5sum -c /usr/home/username/backup_reference.md5 | grep ": OK" | cut -d : -f 1 | tee -a lst.delete

# remove duplicates
cat lst.delete | xargs rm -f



Of course this can be looped up and put in a script or whatever suits you. :)
 
Hi Jose, im, tanis,

thank you for the ideas.

As I understand you propose a two step process (i) generate a list of duplicates based on checksum, and (ii) delete the duplicates based on the list.

I could not figure the rsync(1) commands as I was trying to accomplish all in one step.

Kindest regards,

M
 
Greeting all,

Could some rsync(1) expert please check the following proposed solution?

1. Generate a list of files that are not duplicates:
rsync -navc user@server_01:/backup_old/ /backup_reference/ > /path/to/keep_files.txt

2. Move files to be deleted, i.e., different from /keep_files.txt :
rsync -nvd --remove-source-files --exclude-from=/path/to/keep_files.txt /backup.old /path/to/backup.old/delete

3. Delete all files form /path/to/backup.old/delete
rm -Idr /path/to/backup.old/delete

Explanation of directives:
--remove-source-files sender removes synchronized files (non-dir)
--exclude-from=FILE read exclude patterns from FILE
Remove -n after testing.

This obviously will not work, since rsync(1) must be run as a root and I have remote root login disabled. I can mount the share and run as a local.

Kindest regards,

M
 
Back
Top