Shell find & xargs

balanga

Son of Beastie

Reaction score: 217
Messages: 4,024

I want to print out the first line from similarly named files in different subdirectoies using this cmd:-

#find $DIR -name $FILE | xargs head -1

and whilst I do get the line I want, each output is preceded by ==> $FILE <==

Where does this come from, and how do I suppress it?
 
OP
B

balanga

Son of Beastie

Reaction score: 217
Messages: 4,024

Thanks, I would never have guessed that looking at xargs ()

-n number, --max-args=number
Set the maximum number of arguments taken from standard input for
each invocation of utility. An invocation of utility will use
less than number standard input arguments if the number of bytes
accumulated (see the -s option) exceeds the specified size or
there are fewer than number arguments remaining for the last in-
vocation of utility. The current default value for number is
5000.
 

ralphbsz

Son of Beastie

Reaction score: 2,186
Messages: 3,136

Xargs is surprisingly powerful, which means it has a large number of flags. By using various combinations of -n, -L and -P, you can get interesting effects. Note that xargs distinguishes lines of input with arguments (there may be multiple arguments on one line).

And if you have spaces in your file names (or other unprintable characters), you should be using "find ... -print0 | xargs -0" (that's a zero, not an Oh).
 

Eric A. Borisch

Aspiring Daemon

Reaction score: 355
Messages: 581

I want to print out the first line from similarly named files in different subdirectoies using this cmd:-

#find $DIR -name $FILE | xargs head -1

and whilst I do get the line I want, each output is preceded by ==> $FILE <==

Where does this come from, and how do I suppress it?
The header is coming from the fact the head(1) is being called with multiple arguments. As suggested, xargs -n 1 will call head with only one argument (file) at a time. You also (as mentioned above) likely want -print0 on the find and -0 option to xargs(1) to handle spaces.

Alternatively, you can also do this directly from find(1): find $DIR -name $FILE -exec head -n 1 '{}' ';'
 

Alain De Vos

Daemon

Reaction score: 546
Messages: 1,890

Xargs is surprisingly powerful, which means it has a large number of flags. By using various combinations of -n, -L and -P, you can get interesting effects. Note that xargs distinguishes lines of input with arguments (there may be multiple arguments on one line).

And if you have spaces in your file names (or other unprintable characters), you should be using "find ... -print0 | xargs -0" (that's a zero, not an Oh).
Never thought you could use xargs to parallelise.
 

Eric A. Borisch

Aspiring Daemon

Reaction score: 355
Messages: 581

Never thought you could use xargs to parallelise.
It’s wonderful for certain tasks; this is not one of them if you are interested in getting an accurate representation of all the first lines from the files, as there’s no synchronization between the output lines of the parallel legs of execution.
 

covacat

Well-Known Member

Reaction score: 171
Messages: 365

Code:
find . -name \*c|xargs -J% head -n 1 % /dev/null |awk '(NR % 3 == 2) {print}'
this runs head with multiple args and uses awk to print every 3rd line beginning with the 2nd (skip title and footer empty line)

we append /dev/null to the found files so we have more than one otherwise (single file found) head will output just one line not 3
 

ralphbsz

Son of Beastie

Reaction score: 2,186
Messages: 3,136

Never thought you could use xargs to parallelise.
I use it for that all the time. Say I have a variable set of programs to run, all of which are CPU intensive. I have, through trial and error, determined the correct number of tasks to run in parallel; in many cases, it's exactly one per core (for example if the programs are written in Python, which has the GIL). So I write a shell script which makes all the programs into a list (in a temporary file), pipe that into xargs, and my machine is kept nearly optimally busy. It's like having a whole batch scheduling system, in 10 lines of shell.

The obvious problem is obvious: Doing IO (like reading files and writing output) in parallel is hard. You end up "wrapping" the programs such that they have no stdin and stdout, and redirect everything from/to files, which you later aggregate or segregate.
 

Eric A. Borisch

Aspiring Daemon

Reaction score: 355
Messages: 581

Code:
find . -name \*c|xargs -J% head -n 1 % /dev/null |awk '(NR % 3 == 2) {print}'
this runs head with multiple args and uses awk to print every 3rd line beginning with the 2nd (skip title and footer empty line)

we append /dev/null to the found files so we have more than one otherwise (single file found) head will output just one line not 3
Other options:

find . <findargs> -execdir awk '{print;nextfile;}' '{}' '+'

or a little faster perhaps, as the find won't have to wait for the awk:

find . <findargs> -print0 | xargs -0 awk '{print;nextfile;}'

Either spawns awk as needed (once we have enough arguments to fill up a command line), and prints the first line of each argument passed to it. You may want xargs to limit the number of arguments ( -n count) to get awk to launch more frequently, otherwise you may be waiting for a while for the ~ ARG_MAX argument size to be reached before each awk is spawned.
 
Top