After a few searches from Google, what I come up with is:
find my_folder -type f -exec grep -l \"needle text\" {} \\; -exec file {} \\; | grep text
find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"
This is unfortunately not space save. Putting this into bash script makes it a bit easier.
This is space safe:
#!/bin/bash
#if [ ! "$1" ] ; then
echo "Usage: $0 <search>";
exit
fi
find . -type f -print0 \
| xargs -0 file \
| grep -P text \
| cut -d: -f1 \
| xargs -i% grep -Pil "$1" "%"
Here's a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.
If you were to write out the problem in steps, it would look like this:
// For every file in this directory
// Check the filetype
// If it's an ASCII file, then print out the filename
To achieve this, we can use three UNIX commands: find
, file
, and grep
.
find
will check every file in the directory.
file
will give us the filetype. In our case, we're looking for a return of 'ASCII text'
grep
will look for the keyword 'ASCII' in the output from file
So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).
find ./ -exec file {} ";" | grep 'ASCII'
Looks complicated, but not bad when we break it down:
find ./
= look through every file in this directory. The find
command prints out the filename of any file that matches the 'expression', or whatever comes after the path, which in our case is the current directory or ./
The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.
-exec
= this flag is an option within the find command that allows us to use the result of some other command as the search expression. It's like calling a function within a function.
file {}
= the command being called inside of find
. The file
command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt
. In our case, we want it to use whatever file is being looked at by the find
command, so we put in the curly braces {}
to act as an empty variable, or parameter. In other words, we're just asking for the system to output a string for every file in the directory.
";"
= this is required by find
and is the punctuation mark at the end of our -exec
command. See the manual for 'find' for more explanation if you need it by running man find
.
| grep 'ASCII'
= |
is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find
command (a string that is the filetype of a single file) and tests it to see if it contains the string 'ASCII'
. If it does, it returns true.
NOW, the expression to the right of find ./
will return true when the grep
command returns true. Voila.
If you are interested in finding any file type by their magic bytes using the awesome file
utility combined with power of find
, this can come in handy:
$ # Let's make some test files
$ mkdir ASCII-finder
$ cd ASCII-finder
$ dd if=/dev/urandom of=binary.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.009023 s, 116 MB/s
$ file binary.file
binary.file: data
$ echo 123 > text.txt
$ # Let the magic begin
$ find -type f -print0 | \
xargs -0 -I @@ bash -c 'file "$@" | grep ASCII &>/dev/null && echo "file is ASCII: $@"' -- @@
Output:
file is ASCII: ./text.txt
Legend: $
is the interactive shell prompt where we enter our commands
You can modify the part after &&
to call some other script or do some other stuff inline as well, i.e. if that file contains given string, cat the entire file or look for a secondary string in it.
Explanation:
find
items that are filesxargs
feed each item as a line into one liner bash
command/scriptfile
checks type of file by magic byte, grep
checks if ASCII
exists, if so, then after &&
your next command executes.find
prints results null
separated, this is good to escape
filenames with spaces and meta-characters in it.xargs
, using -0
option, reads them null
separated, -I @@
takes each record and uses as positional parameter/args to bash
script.--
for bash
ensures whatever comes after it is an argument even
if it starts with -
like -c
which could otherwise be interpreted
as bash optionIf you need to find types other than ASCII, simply replace grep ASCII
with other type, like grep "PDF document, version 1.4"
I do it this way: 1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:
find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &
2) create a function in .bashrc:
findex() {
cat ~/.src_list | xargs grep "$*" 2>/dev/null
}
Then I can use below command to do the search:
findex "needle text"
HTH:)
Based on this SO question :
grep -rIl "needle text" my_folder
I prefer xargs
find . -type f | xargs grep -I "needle text"
if your filenames are weird look up using the -0 options:
find . -type f -print0 | xargs -0 grep -I "needle text"