问题
I came across a .cpp
file in our codebase that is seen as binary by grep. So I can't grep it like a text file, which is annoying and obviously not how things ought to be. So I want to know why grep thinks the file is binary and address the issue.
I tried to find any characters out of the ordinary using the command
grep -Pna --color -r "[\x00-\x08]|[\x10-\x19]|[\x80-\xFF]" test.cpp
but it doesn't yield any matches.
How can figure out the cause of this problem?
I should mention I'm on windows git bash.
Output of locale:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
回答1:
Since you’re using MS Windows, it’s possible that the test.cpp
file is encoded using either UTF-16 (common in recent versions of Windows) or Windows-1252 (CP-1252) as its character encoding (perhaps a typographic quote in one of the comments).
When your locale is set to UTF-8 and grep
detects invalid characters for that locale, it assumes that the file is binary. A quick way around this issue, is to get grep
to use the C
locale by temporarily modifying the LC_ALL
environment variable when running the grep
command:
LC_ALL=C grep pattern test.cpp
A better long term solution would be to convert text files (using iconv
or your favourite text editor) to use UTF-8 as their character encoding.
来源:https://stackoverflow.com/questions/35335128/grep-thinks-text-file-is-binary-but-it-isnt