Retrieving python 3.6 handling of re.sub() with zero length matches in python 3.7

痞子三分冷 提交于 2019-12-01 17:54:09

Your solution may be in the regex egg:

Regex Egg Introduction

This regex implementation is backwards-compatible with the standard ‘re’ module, but offers additional functionality. The re module’s behaviour with zero-width matches changed in Python 3.7, and this module will follow that behaviour when compiled for Python 3.7.


Installation:

pip install regex

Usage:

With regex, you can specify the version (V0, V1) which regex pattern will be compiled with, i.e.:

# Python 3.7 and later
import regex
>>> regex.sub('.*', 'x', 'test')
'xx'
>>> regex.sub('.*?', '|', 'test')
'|||||||||'

# Python 3.6 and earlier
import regex
>>> regex.sub('(?V0).*', 'x', 'test')
'x'
>>> regex.sub('(?V1).*', 'x', 'test')
'xx'
>>> regex.sub('(?V0).*?', '|', 'test')
'|t|e|s|t|'
>>> regex.sub('(?V1).*?', '|', 'test')
'|||||||||'

Note:

Version can be indicated by VERSION0 or V0 flag, or (?V0) in the pattern.


Sources:

Regex thread - issue2636
regex 2018.11.22

According to the 3.7 What's New,

The previous behavior can be restored by changing the pattern to r'.+'.

See https://docs.python.org/3/whatsnew/3.7.html under "Changes in the Python API". It seems that the solution would therefore be to modify such a regex; it doesn't seem like there's a flag you can pass to re to request this behavior.

PCRE (including Python 3.7+) that satisfies the original examples would be:

^a*|a+|(?<!a)$

https://regex101.com/r/zTpV1t/3

However, bbaacc would get substituted to xbbxccx (instead of the Python 3.6- version of a* which produced xbxbxcxcx) - it might still be good enough for some people.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!