C Programming Buddy C++

Hello,

https://github.com/eul3er/bdups


I'm currently working on a c++14 implementation of a concurrent Duplicates File Finder for Server Environments.

Say > 1 Million Files.

I'm using Boost, Cryptopp and C++ Standard Library Threads.
Buildsystem: CMake.
Compilers: GCC, Clang.

I'm looking for like-minded people enjoying c++:)
 
Last edited:
The difficult part is not the coding part. It's the algorithm technique. Any idea how you want to achieve this challenge?

I once wondered if I had duplicate files scattered throughout my file system. Although not in the million+, I listed all the file paths in a text file, and sorted the list by file size. That was enough effort for me, and I just did a simple directory structure adjustment afterward.

Out of curiosity, what type of problems would such a program solve? For example, duplicate files are a good thing in a software version control system.

Dominique.
 
Thank you for your question.
There is already a working executable, its like a beta.
Short Algorithm description: Gather file paths + file size, sort, hash, sort, output dups
Features like deduplication are very hardware costly.


Example Benchmark 1:
Intel Avoton CPU C2758 @ 2.40GHZ on Raidz2 (6 x WD RED) -> 8 cores
32GB RAM
current FreeNAS

Programm: (FILES 24.666)
fdupes: 1 Minute 9 Seconds
My Programm: 11 Seconds


Example Benchmark 2: (Files: 173.229)
Intel Core i7-3740QM CPU @ 2.70GHz -> 4 cores + HT on SSD
8GB RAM
current ArchLinux

Programm: (Files: 173.229)
fdupes: 39 Seconds
My Programm: 9 Seconds
 
Back
Top