Using sed, how can a regular expression match Chinese characters?

后端 未结 2 538
没有蜡笔的小新
没有蜡笔的小新 2021-01-03 06:35

I decided to post a question, after spending quite some time and still not figuring out the problem. Also read a bunch of seemingly related posts, none really fit my simple

2条回答
  •  没有蜡笔的小新
    2021-01-03 07:06

    Perl has pretty good support for dealing with Unicode. That might be a better bet for your task than sed. This one-liner works like your first sed example:

    perl -CIOED -p -e 's/\p{Script_Extensions=Han}/$& /g' filename
    

    The -CIOED tells perl to do its I/O in utf8. -p runs the given code once for each line of the input file, then prints the result. -e specifies a line of Perl code to run. See the documentation on command-line arguments for more.

    The regular expression uses named ranges to identify the characters to match.

    You might also want to read the Perl Unicode documentation.

提交回复
热议问题