Solved grep -IL vs ggrep -IL to find binaries

Hello.
I have strange behavior of this commands.
GNU grep is able to find binaries with '-IL' option, but BSD grep is not.
Bash:
> ggrep -LI . ./sheepdog-1.0.1-1_amd64.deb
./sheepdog-1.0.1-1_amd64.deb
> grep -LI . ./sheepdog-1.0.1-1_amd64.deb
>
Why?
 
ggrep(1):
Code:
       -I     Process a	binary file as if it did not  contain  matching	 data;
	      this is equivalent to the	--binary-files=without-match option.

grep(1):
Code:
       -I      Ignore  binary  files.	This  option  is  equivalent  to   the
	       "--binary-files=without-match" option.

Options behave slightly different.
 
OK
Then
ggrep(1):
Code:
       -L, --files-without-match
              Suppress normal output; instead print the name of each input
              file from which no output would normally have been printed.
grep(1):
Code:
     -L, --files-without-match
             Only the names of files not containing selected lines are written
             to standard output.  Pathnames are listed once per file searched.
             If the standard input is searched, the string “(standard input)”
             is written unless a --label is specified.

Is it possible to make BSD grep search for binary files, like GNU one does?
 
Is it possible to make BSD grep search for binary files, like GNU one does?

Not directly. For what you seem to want to do I would think using strings(1) is a better alternative.

Code:
for file in * ; do
    strings "$file" | grep -q something && echo "$file"
done
 
Not directly. For what you seem to want to do I would think using strings(1) is a better alternative.

Code:
for file in * ; do
    strings "$file" | grep -q something && echo "$file"
done
I just want to detect and find binary files in directory tree.
 
"Binary files" isn't a well-defined thing to start with. It's about applying heuristics of what could be text content. E.g. text won't contain NUL bytes, but that doesn't mean everything not containing them is text ... (there's of course a lot more you could do, like trying to guess an encoding, if successful it's text ... or looking for an excessive amount of control characters -> might not be text .... but in any case, there can't be a clear definitive answer)

Combining these two flags with grep (both versions) certainly wasn't meant to "find binary files", grep just has some heuristics to avoid searching non-text files if possible...

I could suggest a clumsy "solution" using file(1) (which is all about trying to determine file types) like this:
Code:
find <directory> -type f -maxdepth 1 \
        | file -Nie soft -f - \
        | sed -n -e '/: application\/octet-stream;.*/{s///' -ep -e'}'
(remove maxdepth for recursing)

I'd still ask: Why do you need this? Maybe there's a better approach about the actual problem to solve?
 
That's just a technical moment in software certification process. I just need to find binaries to explain why they are in my source code tree.
 
That's just a technical moment in software certification process. I just need to find binaries to explain why they are in my source code tree.

Surely they must give an actionable set of criteria that defines what a binary file is for the purpose of that certification.

Just joking. Of course they don't.
 
In that case, how about this suggestion: Use the file command (in a loop, probably driven by a find command) to determine the file types of all files in your source tree. Then use sort/uniq to count how many there are of each type. For example, you might find "1234: C++ source", "45: Makefile", and "3: ELF 64-bit executable". If you get really unlucky, there will be an entry for "data", meaning file was not able to classify the file. Then go over each file type that you recognize as binary (there should be very few if this is a traditional source tree), and see what's up.

Obnoxious remark: Today, there is nothing wrong with putting binary files under source control. Modern SCM systems handle that just fine.
 
Back
Top