问题
The linux file
command does a very good job in recognising file types and gives very fine-grained results. The diff
tool is able to tell binary files from text files, producing a different output.
Is there a way to tell binary files form text files? All I want is a yes/no answer whether a given file is binary. Because it's difficult to define binary, let's say I want to know if diff
will attempt a text-based comparison.
To clarify the question: I do not care if it's ASCII text or XML as long as it's text. Also, I do not want to differentiate between MP3 and JPEG files, as they're all binary.
回答1:
The diff manual specifies that
diff determines whether a file is text or binary by checking the first few bytes in the file; the exact number of bytes is system dependent, but it is typically several thousand. If every byte in that part of the file is non-null, diff considers the file to be text; otherwise it considers the file to be binary.
回答2:
file
is still the command you want. Any file that is text (according to its heuristics) will include the word "text" in the output of file
; anything that is binary will not include the word "text".
If you don't agree with the heuristics that file
uses to determine text vs. not-text, then the question needs to be better specified, since text vs. non-text is an inherently vague question. For example, file
does not identify a PGP public key block in ASCII as "text", but you might (since it is composed only of printable characters, even though it is not human-readable).
回答3:
A quick-and-dirty way is to look for a NUL
character (a zero byte) in the first K or two of the file. As long as you're not worried about UTF-16 or UTF-32, no text file should ever contain a NUL
.
Update: According to the diff manual, this is exactly what diff does.
回答4:
You could try to give a
strings yourfile
command and compare the size of the results with the file size ... i'm not totally sure, but if they are the same the file is really a text file.
回答5:
These days the term "text file" is ambiguous, because a text file can be encoded in ASCII, ISO-8859-*, UTF-8, UTF-16, UTF-32 and so on.
See here for how Subversion does it.
回答6:
This approach uses same criteria as grep
in determining whether a file is binary or text:
is_text_file() {
grep -qI '.' "$1"
}
grep options used:
-q
Quiet; Exit immediately with zero status if any match is found-I
Process a binary file as if it did not contain matching data
grep pattern used:
'.'
match any single character. All files (except an empty file) will match this pattern.
Notes
- An empty file is not considered a text file according to this test.
- Symbolic links are followed.
回答7:
A fast way to do this in ubuntu is use nautilus in the "list" view. The type column will show you if its text or binary
回答8:
Commands like less, grep detect it quite easily(and fast). You can have a look at their source.
来源:https://stackoverflow.com/questions/2644938/how-to-tell-binary-from-text-files-in-linux