Regex in Python

眉间皱痕 提交于 2019-12-11 05:47:31

问题


Goal: Given a number (it may be very long and it is greater than 0), I'd like to get the five least meaningful digits dropping any 0 at the end of that number.

I tried to solve this with regex, Helped by RegexBuddy I came to this one:

[\d]+([\d]{0,4}+[1-9])0*

But python can't compile that.

>>> import re
>>> re.compile(r"[\d]+([\d]{0,4}+[1-9])0*")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/re.py", line 188, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.5/re.py", line 241, in _compile
    raise error, v # invalid expression
sre_constants.error: multiple repeat

The problem is the "+" after "{0,4}", it seems it doesn't work in python (even in 2.6)

How can I write a working regex?

PS: I know you can start dividing by 10 and then using the remainder n%100000... but this is a problem about regex.


回答1:


That regular expression is very superfluous. Try this:

>>> import re
>>> re.compile(r"(\d{0,4}[1-9])0*$")

The above regular expression assumes that the number is valid (it will also match "abc0123450", for example.) If you really need the validation that there are no non-number characters, you may use this:

>>> import re
>>> re.compile(r"^\d*?(\d{0,4}[1-9])0*$")

Anyways, the \d does not need to be in a character class, and the quantifier {0,4} does not need to be forced to be greedy (as the additional + specifies, although apparently Python does not recognize that.)

Also, in the second regular expression, the \d is non-greedy, as I believe this will improve the performance and accuracy. I also made it "zero or more" as I assume that is what you want.

I also added anchors as this ensures that your regular expression won't match anything in the middle of a string. If this is what you desired though (maybe you're scanning a long text?), remove the anchors.




回答2:


\d{0,4}+ is a possessive quantifier supported by certain regular expression flavors such as .NET and Java. Python does not support possessive quantifiers.

In RegexBuddy, select Python in the toolbar at the top, and RegexBuddy will tell you that Python doesn't support possessive quantifiers. The + will be highlighted in red in the regular expression, and the Create tab will indicate the error.

If you select Python on the Use tab in RegexBuddy, RegexBuddy will generate a Python source code snippet with a regular expression without the possessive quantifier, and a comment indicating that the removal of the possessive quantifier may yield different results. Here's the Python code that RegexBuddy generates using the regex from the question:

# Your regular expression could not be converted to the flavor required by this language:
# Python does not support possessive quantifiers

# Because of this, the code snippet below will not work as you intended, if at all.

reobj = re.compile(r"[\d]+([\d]{0,4}[1-9])0*")

What you probably did is select a flavor such as Java in the main toolbar, and then click Copy Regex as Python String. That will give you a Java regular expression formatted as a Pythong string. The items in the Copy menu do not convert your regular expression. They merely format it as a string. This allows you to do things like format a JavaScript regular expression as a Python string so your server-side Python script can feed a regex into client-side JavaScript code.




回答3:


Small tip. I recommend you test with reTest instead of RegExBuddy. There are different regular expression engines for different programming languages. ReTest is valuable in that it allows you to quickly test regular expression strings within Python itself. That way you can insure that you tested your syntax with the Python's regular expression engine.




回答4:


The error seems to be that you have two quantifiers in a row, {0,4} and +. Unless + is meant to be a literal here (which I doubt, since you're talking about numbers), then I don't think you need it at all. Unless it means something different in this situation (possibly the greediness of the {} quantifier)? I would try

[\d]+([\d]{0,4}[1-9])0*

If you actually intended to have both quantifiers to be applied, then this might work

[\d]+(([\d]{0,4})+[1-9])0*

But given your specification of the problem, I doubt that's what you want.




回答5:


This is my solution.

re.search(r'[1-9]\d{0,3}[1-9](?=0*(?:\b|\s|[A-Za-z]))', '02324560001230045980a').group(1)

'4598'

  • [1-9] - the number must start with 1 - 9
  • \d{0,3} - 0 or 3 digits
  • [1-9] - the number must finish with 1 or 9
  • (?=0*(:?\b|\s\|[A-Za-z])) - the final part of string must be formed from 0 and or \b, \s, [A-Za-z]


来源:https://stackoverflow.com/questions/996536/regex-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!