Perl Using exiftool on multi-core systems to batch rename PDFs

Hello,

I have been using
Code:
exiftool '-filename<$title.%e' .
in order to rename thousands of scientific articles that were downloaded with annoyingly cryptic filenames. Some colleagues found out that I have been doing this and now I am the "PDF-renamer-person" in the office.

However, the process does take a bit of time as it is single threaded. Thus I am wondering how it might be possible to parallelize the above on multi-core systems such that each core is scanning a set of PDFs and renaming them concurrently.

I have tried piping the following script through GNU parallel:

Code:
#!/bin/sh
exiftool '-filename<$title.%e' .
done

Code:
cat ./command.sh | parallel -j 4 '{}'
, which doesn't really work as expected. It still defaults somehow to a single core.

So I am inquiring here as to whether more brilliant minds might be able to shed some light on what I can improve in order to make this work more efficiently.

To be clear I am using p5-exiftool from the package collection.
 
One solution may be to use find(1) for example and find everything starting with 'a' and have that spawn a 'converter' shell script in the background. So each starting letter will spawn its own converter. Then you would have 26 single core threads running at the same time. Probably not a good idea as it will probably choke the I/O but you get the idea. Split the list of files into different segments and convert each segment separately at the same time instead of each file individually in sequence.
 
One solution may be to use find(1) for example and find everything starting with 'a' and have that spawn a 'converter' shell script in the background. So each starting letter will spawn its own converter. Then you would have 26 single core threads running at the same time. Probably not a good idea as it will probably choke the I/O but you get the idea. Split the list of files into different segments and convert each segment separately at the same time instead of each file individually in sequence.

Hmm. This sounds interesting. So in theory this is what you had in mind?

Code:
find . -type f -name 'a*' -exec exiftool '-filename<$title.%e' . \;

How could I script this to act on one directory for each letter of the alphabet? I can go up to 32 threads.
 
Something like that, yes. If you start it in the background your script can move to the next letter while the first is being converted.

Code:
find . -type f -name 'a*' -exec exiftool '-filename<$title.%e' . \; &
find . -type f -name 'b*' -exec exiftool '-filename<$title.%e' . \; &
Now both 'a*' and 'b*' will be started right after each other but 'b*' doesn't wait until 'a*' is finished. This is a really quick and dirty way, it's probably better to write a proper script around it.
 
Something like that, yes. If you start it in the background your script can move to the next letter while the first is being converted.

Code:
find . -type f -name 'a*' -exec exiftool '-filename<$title.%e' . \; &
find . -type f -name 'b*' -exec exiftool '-filename<$title.%e' . \; &
Now both 'a*' and 'b*' will be started right after each other but 'b*' doesn't wait until 'a*' is finished. This is a really quick and dirty way, it's probably better to write a proper script around it.

Superb, this works quite well. Thanks!
 
I would look at a single find command piped into xargs -P [jobs] command args

If you have spaces in file names or paths, use -print0 (on find) with (-0) On xargs.
 
Back
Top