Grep thinks text file is binary, but it isn't

寵の児 提交于 2019-12-01 11:56:24

问题


I came across a .cpp file in our codebase that is seen as binary by grep. So I can't grep it like a text file, which is annoying and obviously not how things ought to be. So I want to know why grep thinks the file is binary and address the issue.

I tried to find any characters out of the ordinary using the command

grep -Pna --color -r "[\x00-\x08]|[\x10-\x19]|[\x80-\xFF]" test.cpp

but it doesn't yield any matches.

How can figure out the cause of this problem?

I should mention I'm on windows git bash.

Output of locale:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

回答1:


Since you’re using MS Windows, it’s possible that the test.cpp file is encoded using either UTF-16 (common in recent versions of Windows) or Windows-1252 (CP-1252) as its character encoding (perhaps a typographic quote in one of the comments).

When your locale is set to UTF-8 and grep detects invalid characters for that locale, it assumes that the file is binary. A quick way around this issue, is to get grep to use the C locale by temporarily modifying the LC_ALL environment variable when running the grep command:

LC_ALL=C grep pattern test.cpp

A better long term solution would be to convert text files (using iconv or your favourite text editor) to use UTF-8 as their character encoding.



来源:https://stackoverflow.com/questions/35335128/grep-thinks-text-file-is-binary-but-it-isnt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!