Removing duplicate emails

I am having hard time removing duplicate emails from a users account. The user was trying to setup mail filter in her inbox and made copies of each emails up to 4 copies of each that made her email end up being fifty thousand copies. I have tried couple of command on the backend that will sort out the duplicate emails and delete all the other emails leaving the unique emails by using the awk command and other commands: awk 'a !~ $0; {a=$0}'

I seriously need some assistant in this.
Since the user uses thunderbird as the email client, we have also tried the duplicate remover add-ons in thunderbird and still did not help. The mail client frozes anytime its in the process. She is in a big dilemma because she can't get control of her email all together.

I will very much appreciate help from anyone.

Thanks
 
I am not sure if the message id's are duplicated, but I am certain that the messages are duplicated over 4 or more duplicate messages of any one message.
 
albsallu said:
MKY. How do you go about using the following command that you suggested?

Just enter the Maildir box and run the program:

[cmd=]fdupes -d -N .[/cmd]

This will compare all files in directory (all Your messages), leave the first file and all next which identical will be removed. Comparision algorithm works using file sizes and files MD5 but not names. I had the same problem as you described and this methord works for me.

Before you start, remember about backup your mails (if something goes wrong).
 
There are a multitude of options for dup file finding. I was wondering if anyone had any good/bad experiences with the ones I found in ports and elsewhere (listed below). I have ~200,000 files, each ranging from 5 to 40MB in size (most are very close to 30MB) and would like to go as fast as possible.. plus without using obscene amounts of memory, as these operations will be done on ZFS volumes already using obscene amounts of memory. ;)

Here is what I found. I've used fdupes before but was wondering if others might be faster.. I was thinking about rdfind since it seems to run a bit quicker than the others.

duff - http://duff.sourceforge.net/
dupfind - http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/dupfind/
fslint - http://www.pixelbeat.org/fslint/
fdupes - http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/fdupes/
filedupe - http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/filedupe/
ftwin - http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/ftwin/
rdfind - http://rdfind.pauldreik.se/
samesame - http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/samesame/
weedit - http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/weedit/
whatpix - http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/whatpix/

Opinions?
 
Back
Top