Python regex matching Unicode properties

前端 未结 6 1569
我寻月下人不归
我寻月下人不归 2020-11-22 14:49

Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \\p{Ll} to match an arbitrary l

6条回答
  •  春和景丽
    2020-11-22 14:49

    You're right that Unicode property classes are not supported by the Python regex parser.

    If you wanted to do a nice hack, that would be generally useful, you could create a preprocessor that scans a string for such class tokens (\p{M} or whatever) and replaces them with the corresponding character sets, so that, for example, \p{M} would become [\u0300–\u036F\u1DC0–\u1DFF\u20D0–\u20FF\uFE20–\uFE2F], and \P{M} would become [^\u0300–\u036F\u1DC0–\u1DFF\u20D0–\u20FF\uFE20–\uFE2F].

    People would thank you. :)

提交回复
热议问题