问题
Here is the test file on google drive.
sample :test file
I want to list all bytes non ascii byte which beyond \x00-\x7f with awk in the test file.
There are 12 bytes beyond \x00-\x7f.
It is my try.
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)print i,$i}' test
146 “
148 ”
181 “
184 ”
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %x \n", i,$i)}' test
146 0
148 0
181 0
184 0
Failed,how to list all the 12 bytes in the file as below format.
146 e2
147 80
148 9c
150 e2
151 80
152 9d
185 e2
186 80
187 9c
190 e2
191 80
192 9d
export LC_ALL=C
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %c\n",i,$i)}' test
146
147 �
148 �
150
151 �
152 �
185
186 �
187 �
190
191 �
192 �
How to fix my code?
回答1:
I'm in a UTF8 shell:
$ locale
LANG=en_US.UTF-8
...
so first:
$ export LC_ALL=C
Then:
$ awk -F '' ' # split record in fields
BEGIN { for(n=0;n<256;n++) # iterate all values
ord[sprintf("%c",n)]=n } # make a hash ord[char]=n
{ for(i=1;i<=NF;i++) # iterate all fields
if(ord[$i]>127) # beyond 7f
print ord[$i] } # print n (value)
' test
Outputs:
226
128
156
226
128
157
226
128
156
226
128
157
which in hex would be:
e2
80
9c
...
来源:https://stackoverflow.com/questions/43337015/how-to-list-all-non-ascii-bytes-with-awk