Searching for non-ascii characters

后端未结

关注

 3  825

难免孤独 2021-01-24 10:24

I have a file, a.out, which contains a number of lines. Each line is one character only, either the unicode character U+2013 or a lower case letter a-z

3条回答

刺人心 (楼主)

2021-01-24 11:02

gawk can help you for this problem,

here is the awk one-liner:

 awk -v FS="" 'BEGIN{for(i=1;i<128;i++)ord[sprintf("%c",i)]=i}
               {for(i=1;i<=NF;i++)if(!($i in ord))print $i}' file

below is a test with gawk:

kent$  cat f
abcd
+ß
s+äö
ö--我
中文

kent$  awk -v FS="" 'BEGIN{for(i=1;i<128;i++)ord[sprintf("%c",i)]=i}{for(i=1;i<=NF;i++)if(!($i in ord))print $i}' f
ß
ä
ö
ö
我
中
文

0 讨论(0)

查看其它3个回答