How to tell binary from text files in linux

删除回忆录丶 提交于 2019-12-04 17:36:31

问题


The linux file command does a very good job in recognising file types and gives very fine-grained results. The diff tool is able to tell binary files from text files, producing a different output.

Is there a way to tell binary files form text files? All I want is a yes/no answer whether a given file is binary. Because it's difficult to define binary, let's say I want to know if diff will attempt a text-based comparison.

To clarify the question: I do not care if it's ASCII text or XML as long as it's text. Also, I do not want to differentiate between MP3 and JPEG files, as they're all binary.


回答1:


The diff manual specifies that

diff determines whether a file is text or binary by checking the first few bytes in the file; the exact number of bytes is system dependent, but it is typically several thousand. If every byte in that part of the file is non-null, diff considers the file to be text; otherwise it considers the file to be binary.




回答2:


file is still the command you want. Any file that is text (according to its heuristics) will include the word "text" in the output of file; anything that is binary will not include the word "text".

If you don't agree with the heuristics that file uses to determine text vs. not-text, then the question needs to be better specified, since text vs. non-text is an inherently vague question. For example, file does not identify a PGP public key block in ASCII as "text", but you might (since it is composed only of printable characters, even though it is not human-readable).




回答3:


A quick-and-dirty way is to look for a NUL character (a zero byte) in the first K or two of the file. As long as you're not worried about UTF-16 or UTF-32, no text file should ever contain a NUL.

Update: According to the diff manual, this is exactly what diff does.




回答4:


You could try to give a

strings yourfile

command and compare the size of the results with the file size ... i'm not totally sure, but if they are the same the file is really a text file.




回答5:


These days the term "text file" is ambiguous, because a text file can be encoded in ASCII, ISO-8859-*, UTF-8, UTF-16, UTF-32 and so on.

See here for how Subversion does it.




回答6:


This approach uses same criteria as grep in determining whether a file is binary or text:

is_text_file() { 
  grep -qI '.' "$1"
}

grep options used:

  • -q Quiet; Exit immediately with zero status if any match is found
  • -I Process a binary file as if it did not contain matching data

grep pattern used:

  • '.' match any single character. All files (except an empty file) will match this pattern.

Notes

  • An empty file is not considered a text file according to this test.
  • Symbolic links are followed.



回答7:


A fast way to do this in ubuntu is use nautilus in the "list" view. The type column will show you if its text or binary




回答8:


Commands like less, grep detect it quite easily(and fast). You can have a look at their source.



来源:https://stackoverflow.com/questions/2644938/how-to-tell-binary-from-text-files-in-linux

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!