问题
I use TextEdit on macosx created two files, same contents with different encodings, then
grep xxx filename_UTF-16
nothing
grep xxx filename_UTF-8
xxxxxxx xxxxxxyyyyyy
grep did not support UTF-16?
回答1:
iconv -f UTF-16 -t UTF-8 yourfile | grep xxx
回答2:
You could always try converting first to utf-8:
iconv -f utf-16 -t utf-8 filename | grep xxxxx
回答3:
Use ripgrep utility instead of grep
which can support grepping UTF-16 files. Install by: brew install ripgrep.
Then run:
rg xxx filename_UTF-16
ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the
-E
/--encoding flag.
)
回答4:
Define the following Ruby's shell function:
grep16() { ruby -e "puts File.open('$2', mode:'rb:BOM|UTF-16LE').readlines.grep(Regexp.new '$1'.encode(Encoding::UTF_16LE))"; }
Then use it as:
grep16 xxx filename_UTF-16
See: How to use Ruby's readlines.grep for UTF-16 files?
For more suggestions, check: grepping binary files and UTF16
回答5:
You could also use ugrep which is a drop-in replacement of grep and backwards compatible to GNU/BSD grep, meaning it takes the same options as grep but offers vastly more features, such as:
ugrep searches UTF-encoded input when UTF BOM (byte order mark) are present and ASCII and UTF-8 when no UTF BOM is present. Option
--encoding
permits many other file formats to be searched, such as ISO-8859-1, EBCDIC, and code pages 437, 850, 858, 1250 to 1258.ugrep matches Unicode patterns by default (disabled with option
-U
). The regular expression syntax is POSIX ERE compliant, extended with Unicode character classes, lazy quantifiers, and negative patterns to skip unwanted pattern matches to produce more precise results.ugrep searches text files and binary files and produces hexdumps for binary matches.
来源:https://stackoverflow.com/questions/6882070/grep-unicode-16-support