Escape all metacharacters in Python

谁都会走 提交于 2019-12-07 23:41:42

问题


I need to search for patterns which may have many metacharacters. Currently I use a long regex.

prodObjMatcher=re.compile(r"""^(?P<nodeName>[\w\/\:\[\]\<\>\@\$]+)""", re.S|re.M|re.I|re.X)

(my actual pattern is very long so I just pasted some relevant portion on which I need help)

This is especially painful when I need to write combinations of such patterns in a single re compilation.

Is there a pythonic way for shortening the pattern length?


回答1:


Look, your pattern can be reduced to

r"""^(?P<nodeName>[]\w/:[<>@$]+).*?"""

Note that you do not have to ever escape any non-word character in the character classes, except for shorthand classes, ^, -, ], and \. There are ways to keep even those (except for \) unescaped in the character class:

  • ] at the start of the character class
  • - at the start/end of the character class
  • ^ - should only be escaped if you place it at the start of the character class as a literal symbol.

Outside a character class, you must escape \, [, (, ), +, $, ^, *, ?, ..

Note that / is not a special regex metacharacter in Python regex patterns, and does not have to be escaped.

Use raw string literals when defining your regex patterns to avoid issues (like confusing word boundary r'\b' and a backspace '\b').



来源:https://stackoverflow.com/questions/38897045/escape-all-metacharacters-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!