Shell Comparing two directories

I'm looking for a script to compare two directories, A and B, and if any files are found to be identical then to delete them from A. Not sure whether I should use cmp() or diff()....
 
fdupes compare the files by they md5 checksum. If the contents of the files is different then md5 checksum will be different too and the files will not be identified as duplicates.

For example you can have two mp3 files with different bitrate of the same song. Even if the song is the same the md5 checksum of the files will be different and it won't be detected as duplicates.
 
fdupes compare the files by they md5 checksum. If the contents of the files is different then md5 checksum will be different too and the files will not be identified as duplicates.

If the files have slightly different names because of different character sets will the md5 checksum be different?
 
Isn't this what rsync is for? I don't remember all its bazillion command line options but one I'll never forget is --dry-run.
I'm looking for a script to compare two directories, A and B, and if any files are found to be identical then to delete them from A. Not sure whether I should use cmp() or diff()....
 
The file name doesn't matter. The files are compared by they size and md5 checksum.

That's what I thought but I see quite a number of files which didn't get deleted....

However when I try running cmp i get
: No such file or directory
although the files are visible when I do a directory listing... They are on an NTFS partition and maybe have invalid filenames.... Must investigate further.
 
I've managed to delete a lot of file thanks to this, but some caases I get subdirectories containg a single file - Thumbs.db

Not exactly sure what this file is or how to view it.
 
Thumbs.db is metadata which the Windows Explorer generates automatically. I always delete this when it comes into my way:
find /path/of/the/base/directory -name Thumbs.db -delete

The same goes for .DS_Store files wich the Mac Finder creates automatically:
find /path/of/the/base/directory -name .DS_Store -delete

Beware, if you delete these files, the appearance of directories and its contents would be reset to the default views.
 
For example you can have two mp3 files with different bitrate of the same song. Even if the song is the same the md5 checksum of the files will be different and it won't be detected as duplicates.

I'm just finding this... I'm trying to tidy up my music collection and was surprised that sysutils/fdupes did not flag numerous mp3's in different directories which had the same size. Running cmp() showed that there was a single different character but to all intents and purposes they were duplicates. Is there any way to deal with this?
 
If your goal is to remove similar music files then search for software that can compare they acoustic fingerprint or mp3 id3 tags.
 
Back
Top