Regular expression quoting in Python

落爺英雄遲暮 提交于 2021-02-05 07:11:05

问题


How should I declare a regular expression?

mergedData = re.sub(r'\$(.*?)\$', readFile, allData)

I'm kind of wondering why this worked. I thought that I need to use the r'' to pass a regular expression.

mergedData = re.sub("\$(.*?)\$", readFile, allData)

What does "\$" result in in this case? Why? I would have thought "$".


回答1:


I thought that I need to user the r'' to pass a regular expression.

r before a string literal indicates raw string, which means the usual escape sequences such as \n or \r are no longer treated as new line character or carriage return, but simply \ followed by n or r. To specify a \, you only need \ in raw string literal, while you need to double it up \\ in normal string literal. This is why it is usually the case that raw string is used in specifying regular expression1. It reduces the confusion when reading the code. You would have to do escaping twice if you use normal string literal: once for the normal string literal escape and the second time for the escaping in regex.

What does "\$" result in this case? Why? I would have thought "$"

In Python normal string literal, if \ is not followed by an escape sequence, the \ is preserved. Therefore "\$" results in \ followed by $.

This behavior is slightly different from the way C/C++ or JavaScript handle similar situation: the \ is considered escape for the next character, and only the next character remains. So "\$" in those languages will be interpreted as $.

Footnote

1: There is a small defect with the design of raw string in Python, though: Why can't Python's raw string literals end with a single backslash?




回答2:


The r'...' escapes sequences like '\1' (reference to first group in a regular expression, but the same as '\x01' if not escaped).

Generally speaking in r'...' the backslash won't behave as an escape character.

Try

 re.split('(.).\1', '1x2x3')  # ['1x2x3']

vs.

 re.split(r'(.).\1', '1x2x3') # ['1', 'x', '3']

As '\$' is not an escape sequence in python, it is literally the same as '\\$'.




回答3:


Just ask the snake:

>>> r'\$(.*?)\$'=='\$(.*?)\$'
True
>>> r'\vert'=='\vert'
False
>>> r'\123'=='\123'
False
>>> r'\#23'=='\#23'
True

Basically if \x would create an esacped character in C, using r in a string prefix is the same as \\x:

>>> r'\123'=='\\123'
True
>>> r'\tab'=='\\tab'
True


来源:https://stackoverflow.com/questions/15122698/regular-expression-quoting-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!