Clean Python Regular Expressions

前端 未结 3 2097
渐次进展
渐次进展 2020-12-10 11:51

Is there a cleaner way to write long regex patterns in python? I saw this approach somewhere but regex in python doesn\'t allow lists.

patterns = [
    re.co         


        
相关标签:
3条回答
  • 2020-12-10 12:35

    Though @Ayman's suggestion about re.VERBOSE is a better idea, if all you want is what you're showing, just do:

    patterns = re.compile(
            r'<!--([^->]|(-+[^->])|(-?>))*-{2,}>'
            r'\n+|\s{2}'
    )
    

    and Python's automatic concatenation of adjacent string literals (much like C's, btw) will do the rest;-).

    0 讨论(0)
  • 2020-12-10 12:38

    You can use comments in regex's, which make them much more readable. Taking an example from http://gnosis.cx/publish/programming/regular_expressions.html :

    /               # identify URLs within a text file
              [^="] # do not match URLs in IMG tags like:
                    # <img src="http://mysite.com/mypic.png">
    http|ftp|gopher # make sure we find a resource type
              :\/\/ # ...needs to be followed by colon-slash-slash
          [^ \n\r]+ # stuff other than space, newline, tab is in URL
        (?=[\s\.,]) # assert: followed by whitespace/period/comma 
    /
    
    0 讨论(0)
  • 2020-12-10 12:48

    You can use verbose mode to write more readable regular expressions. In this mode:

    • Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash.
    • When a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.

    The following two statements are equivalent:

    a = re.compile(r"""\d +  # the integral part
                       \.    # the decimal point
                       \d *  # some fractional digits""", re.X)
    
    b = re.compile(r"\d+\.\d*")
    

    (Taken from the documentation of verbose mode)

    0 讨论(0)
提交回复
热议问题