Searching for non-ascii characters

后端 未结 3 824
难免孤独
难免孤独 2021-01-24 10:24

I have a file, a.out, which contains a number of lines. Each line is one character only, either the unicode character U+2013 or a lower case letter a-z

3条回答
  •  野性不改
    2021-01-24 10:52

    I recommend avoiding dodgy grep -P implementations and use the real thing. This works:

    perl -CSD -nle 'print "$.: $_" if /\P{ASCII}/' utfile1 utfile2 utfile3 ...
    

    Where:

    • The -CSD options says that both the stdio trio (stdin, stdout, stderr) and disk files should be treated as UTF-8 encoded.

    • The $. represents the current record (line) number.

    • The $_ represents the current line.

    • The \P{ASCII} matches any code point that is not ASCII.

提交回复
热议问题