Python regex matching Unicode properties

前端未结

关注

 6  1569

我寻月下人不归 2020-11-22 14:49

Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \\p{Ll} to match an arbitrary l

6条回答

春和景丽 (楼主)

2020-11-22 14:49

You're right that Unicode property classes are not supported by the Python regex parser.

If you wanted to do a nice hack, that would be generally useful, you could create a preprocessor that scans a string for such class tokens (\p{M} or whatever) and replaces them with the corresponding character sets, so that, for example, \p{M} would become [\u0300–\u036F\u1DC0–\u1DFF\u20D0–\u20FF\uFE20–\uFE2F], and \P{M} would become [^\u0300–\u036F\u1DC0–\u1DFF\u20D0–\u20FF\uFE20–\uFE2F].

People would thank you. :)

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...