Linux command: How to 'find' only text files?

前端 未结 16 884
孤街浪徒
孤街浪徒 2020-12-02 04:43

After a few searches from Google, what I come up with is:

find my_folder -type f -exec grep -l \"needle text\" {} \\; -exec file {} \\; | grep text


        
相关标签:
16条回答
  • 2020-12-02 05:06
    find . -type f | xargs file | grep "ASCII text" | awk -F: '{print $1}'
    

    Use find command to list all files, use file command to verify they are text (not tar,key), finally use awk command to filter and print the result.

    0 讨论(0)
  • 2020-12-02 05:07

    I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find to find only non-binary files:

    find . -type f -exec grep -Iq . {} \; -print
    

    The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, @lucas.werkmeister!)

    Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.

    EDIT: As @ruslan correctly pointed out, the -and can be omitted since it is implied.

    0 讨论(0)
  • 2020-12-02 05:09

    Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:

    function findTextInAsciiFiles {
        # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
        find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text
    }
    

    put it in your .bashrc and then just run:

    findTextInAsciiFiles your_folder "needle text"
    

    whenever you want.


    EDIT to reflect OP's edit:

    if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :: cut -d':' -f1:

    function findTextInAsciiFiles {
        # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
        find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1
    }
    
    0 讨论(0)
  • 2020-12-02 05:09

    How about this:

    $ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'
    

    If you want the filenames without the file types, just add a final sed filter.

    $ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
    

    You can filter-out unneeded file types by adding more -e 'type' options to the last grep command.

    EDIT:

    If your xargs version supports the -d option, the commands above become simpler:

    $ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
    
    0 讨论(0)
  • 2020-12-02 05:11

    I have two issues with histumness' answer:

    • It only list text files. It does not actually search them as requested. To actually search, use

      find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
      
    • It spawns a grep process for every file, which is very slow. A better solution is then

      find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
      

      or simply

      find . -type f -print0 | xargs -0 grep -I "needle text"
      

      This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.

    Also, nobody cited ag, the Silver Searcher or ack-grep¸as alternatives. If one of these are available, they are much better alternatives:

    ag -t "needle text"    # Much faster than ack
    ack -t "needle text"   # or ack-grep
    

    As a last note, beware of false positives (binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.

    0 讨论(0)
  • 2020-12-02 05:13

    Another way of doing this:

    # find . |xargs file {} \; |grep "ASCII text"
    

    If you want empty files too:

    #  find . |xargs file {} \; |egrep "ASCII text|empty"
    
    0 讨论(0)
提交回复
热议问题