How to list all non ascii bytes with awk?

☆樱花仙子☆ 提交于 2021-01-28 06:31:05

问题


Here is the test file on google drive.

sample :test file

I want to list all bytes non ascii byte which beyond \x00-\x7f with awk in the test file.
There are 12 bytes beyond \x00-\x7f.

It is my try.

awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)print i,$i}'  test
146 “
148 ”
181 “
184 ”

awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %x \n", i,$i)}'  test
146 0 
148 0 
181 0 
184 0

Failed,how to list all the 12 bytes in the file as below format.

146  e2
147  80
148  9c
150  e2
151  80
152  9d
185  e2
186  80
187  9c
190  e2
191  80
192  9d

export LC_ALL=C
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %c\n",i,$i)}'  test
146 
147 �
148 �
150 
151 �
152 �
185 
186 �
187 �
190 
191 �
192 �

How to fix my code?


回答1:


I'm in a UTF8 shell:

$ locale
LANG=en_US.UTF-8
...

so first:

$ export LC_ALL=C

Then:

$ awk -F '' '                         # split record in fields
BEGIN { for(n=0;n<256;n++)            # iterate all values
            ord[sprintf("%c",n)]=n }  # make a hash ord[char]=n
      { for(i=1;i<=NF;i++)            # iterate all fields
            if(ord[$i]>127)           # beyond 7f
                print ord[$i] }       # print n (value)
' test

Outputs:

226
128
156
226
128
157
226
128
156
226
128
157

which in hex would be:

e2
80
9c
...


来源:https://stackoverflow.com/questions/43337015/how-to-list-all-non-ascii-bytes-with-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!