Solved find & xargs

I want to print out the first line from similarly named files in different subdirectoies using this cmd:-

#find $DIR -name $FILE | xargs head -1

and whilst I do get the line I want, each output is preceded by ==> $FILE <==

Where does this come from, and how do I suppress it?
 
Thanks, I would never have guessed that looking at xargs ()

-n number, --max-args=number
Set the maximum number of arguments taken from standard input for
each invocation of utility. An invocation of utility will use
less than number standard input arguments if the number of bytes
accumulated (see the -s option) exceeds the specified size or
there are fewer than number arguments remaining for the last in-
vocation of utility. The current default value for number is
5000.
 
Xargs is surprisingly powerful, which means it has a large number of flags. By using various combinations of -n, -L and -P, you can get interesting effects. Note that xargs distinguishes lines of input with arguments (there may be multiple arguments on one line).

And if you have spaces in your file names (or other unprintable characters), you should be using "find ... -print0 | xargs -0" (that's a zero, not an Oh).
 
I want to print out the first line from similarly named files in different subdirectoies using this cmd:-

#find $DIR -name $FILE | xargs head -1

and whilst I do get the line I want, each output is preceded by ==> $FILE <==

Where does this come from, and how do I suppress it?
The header is coming from the fact the head(1) is being called with multiple arguments. As suggested, xargs -n 1 will call head with only one argument (file) at a time. You also (as mentioned above) likely want -print0 on the find and -0 option to xargs(1) to handle spaces.

Alternatively, you can also do this directly from find(1): find $DIR -name $FILE -exec head -n 1 '{}' ';'
 
Xargs is surprisingly powerful, which means it has a large number of flags. By using various combinations of -n, -L and -P, you can get interesting effects. Note that xargs distinguishes lines of input with arguments (there may be multiple arguments on one line).

And if you have spaces in your file names (or other unprintable characters), you should be using "find ... -print0 | xargs -0" (that's a zero, not an Oh).
Never thought you could use xargs to parallelise.
 
Never thought you could use xargs to parallelise.
It’s wonderful for certain tasks; this is not one of them if you are interested in getting an accurate representation of all the first lines from the files, as there’s no synchronization between the output lines of the parallel legs of execution.
 
Code:
find . -name \*c|xargs -J% head -n 1 % /dev/null |awk '(NR % 3 == 2) {print}'
this runs head with multiple args and uses awk to print every 3rd line beginning with the 2nd (skip title and footer empty line)

we append /dev/null to the found files so we have more than one otherwise (single file found) head will output just one line not 3
 
Never thought you could use xargs to parallelise.
I use it for that all the time. Say I have a variable set of programs to run, all of which are CPU intensive. I have, through trial and error, determined the correct number of tasks to run in parallel; in many cases, it's exactly one per core (for example if the programs are written in Python, which has the GIL). So I write a shell script which makes all the programs into a list (in a temporary file), pipe that into xargs, and my machine is kept nearly optimally busy. It's like having a whole batch scheduling system, in 10 lines of shell.

The obvious problem is obvious: Doing IO (like reading files and writing output) in parallel is hard. You end up "wrapping" the programs such that they have no stdin and stdout, and redirect everything from/to files, which you later aggregate or segregate.
 
Code:
find . -name \*c|xargs -J% head -n 1 % /dev/null |awk '(NR % 3 == 2) {print}'
this runs head with multiple args and uses awk to print every 3rd line beginning with the 2nd (skip title and footer empty line)

we append /dev/null to the found files so we have more than one otherwise (single file found) head will output just one line not 3
Other options:

find . <findargs> -execdir awk '{print;nextfile;}' '{}' '+'

or a little faster perhaps, as the find won't have to wait for the awk:

find . <findargs> -print0 | xargs -0 awk '{print;nextfile;}'

Either spawns awk as needed (once we have enough arguments to fill up a command line), and prints the first line of each argument passed to it. You may want xargs to limit the number of arguments ( -n count) to get awk to launch more frequently, otherwise you may be waiting for a while for the ~ ARG_MAX argument size to be reached before each awk is spawned.
 
Back
Top