Searching for non-ascii characters

后端 未结 3 825
难免孤独
难免孤独 2021-01-24 10:24

I have a file, a.out, which contains a number of lines. Each line is one character only, either the unicode character U+2013 or a lower case letter a-z

3条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-24 11:02

    gawk can help you for this problem,

    here is the awk one-liner:

     awk -v FS="" 'BEGIN{for(i=1;i<128;i++)ord[sprintf("%c",i)]=i}
                   {for(i=1;i<=NF;i++)if(!($i in ord))print $i}' file
    

    below is a test with gawk:

    kent$  cat f
    abcd
    +ß
    s+äö
    ö--我
    中文
    
    kent$  awk -v FS="" 'BEGIN{for(i=1;i<128;i++)ord[sprintf("%c",i)]=i}{for(i=1;i<=NF;i++)if(!($i in ord))print $i}' f
    ß
    ä
    ö
    ö
    我
    中
    文
    

提交回复
热议问题