Command to see which files are human readable?

So I know about ls, and I know I can do ls -l to see if a file is executable, but that can be a little misleading. I also know that I can do ls --color=always to get a colour representation of the directory. However what I am looking for is a clear way to tell in advance if a file is 'binary' (in the sense that the contents cannot be read by something like a pager, I know that text files are binary too).

I'll give you an example incase you are confused.

In /boot we have the efi files which are marked as executables, so it is easy to avoid these. I know I cannot read these with a pager. But I also cannot read boot0, boot1, boot2, etc because these 'may' be binary files according to less. I'm finding it quite hard to navigate the file system, because some are marked clearly and others are not. (For example a 4th file or lua file will obviously be human readable, but others tell me nothing and have no executable flag).

Is there an easy way to see which files in a directory are human readable?
 
You could check the file magic and search for a UTF-8 byte order mark. Problem with this approach is that most plain text files don't have a file magic.
The real question is: Why do you want to read random files?
Just interested in how it all works together, been studying bits and pieces like the loader and it would be nice to tell which files are readable and which are not without having to keep doing ls followed by less.
 
All files are treated the same, so you can't just look at the meta data from a file to see what it contains. Scripts are human readable but can be set executable too. Libraries are binaries but don't have an executable bit (you're not supposed to execute them directly).

File extensions, like .txt are a relic left over from the old CP/M days. MS-DOS used to use them too. Unix never relied on them. You can use them but it's just a part of the filename, they don't have any special meaning.
 
One thing you can use is the command file. This will show what sort of file something is.

For example
Code:
file /usr/local/bin/dwm

/usr/local/bin/dwm: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 13.0 (1300139), FreeBSD-style, stripped
Code:
file /var/log/messages
/var/log/messages: ASCII text

To examine such files you can use the strings command which gives a lot of information about the file, though from your description, I don't think it's the type of information you're looking for.
 
I don't think there is an easy solution to this problem.
If no one has any ideas I might see how less goes about working it out and then write a script that skips or reads them based on less output, if at all possible, just wanted to know if I was reinventing the wheel haha. Thanks anyway.
 
One thing you can use is the command file. This will show what sort of file something is.

For example
Code:
file /usr/local/bin/dwm

/usr/local/bin/dwm: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 13.0 (1300139), FreeBSD-style, stripped
Code:
file /var/log/messages
/var/log/messages: ASCII text

To examine such files you can use the strings command which gives a lot of information about the file, though from your description, I don't think it's the type of information you're looking for.
Thanks I can see right away that this is a better solution to what I am already doing. Not sure if there will be any edge cases yet but it is fairly easy to guess from the output whether a file is binary or not. Just used file mbr which says it is a boot sector so it is easy to guess it does not contain much readable text, and I confirmed this with strings -a which can strip out any readable text from the binary.

Edit: strings -a | more is working quite nicely, it's pretty obvious to tell when the file is not meant to be human readable based on the output, must remember not to use it for human readable files though as it strips whitespace that could be sensitive in some contexts.
 
Back
Top