Using sed, how can a regular expression match Chinese characters?

后端未结

关注

 2  538

没有蜡笔的小新 2021-01-03 06:35

I decided to post a question, after spending quite some time and still not figuring out the problem. Also read a bunch of seemingly related posts, none really fit my simple

2条回答

没有蜡笔的小新 (楼主)

2021-01-03 07:06
Perl has pretty good support for dealing with Unicode. That might be a better bet for your task than sed. This one-liner works like your first sed example:
```
perl -CIOED -p -e 's/\p{Script_Extensions=Han}/$& /g' filename
```
The -CIOED tells perl to do its I/O in utf8. -p runs the given code once for each line of the input file, then prints the result. -e specifies a line of Perl code to run. See the documentation on command-line arguments for more.

The regular expression uses named ranges to identify the characters to match.

You might also want to read the Perl Unicode documentation.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...