Regular expression to return all characters between two special characters

后端 未结 3 1099
刺人心
刺人心 2020-12-02 17:03

How would I go about using regx to return all characters between two brackets. Here is an example:

foobar[\'infoNeededHere\']ffffd
needs to return infoNeededHe         


        
相关标签:
3条回答
  • 2020-12-02 17:42

    ^.*\['(.*)'\].*$ will match a line and capture what you want in a group.

    You have to escape the [ and ] with \

    The documentation at the rubular.com proof link will explain how the expression is formed.

    0 讨论(0)
  • 2020-12-02 17:46

    If there's only one of these [.....] tokens per line, then you don't need to use regular expressions at all:

    In [7]: mystring = "Bacon, [eggs], and spam"
    
    In [8]: mystring[ mystring.find("[")+1 : mystring.find("]") ]
    Out[8]: 'eggs'
    

    If there's more than one of these per line, then you'll need to modify Jarrod's regex ^.*\['(.*)'\].*$ to match multiple times per line, and to be non greedy. (Use the .*? quantifier instead of the .* quantifier.)

    In [15]: mystring = "[Bacon], [eggs], and [spam]."
    
    In [16]: re.findall(r"\[(.*?)\]",mystring)
    Out[16]: ['Bacon', 'eggs', 'spam']
    
    0 讨论(0)
  • 2020-12-02 18:03

    If you're new to REG(gular) EX(pressions) you learn about them at Python Docs. Or, if you want a gentler introduction, you can check out the HOWTO. They use Perl-style syntax.

    Regex

    The expression that you need is .*?\[(.*)\].*. The group that you want will be \1.
    - .*?: . matches any character but a newline. * is a meta-character and means Repeat this 0 or more times. ? makes the * non-greedy, i.e., . will match up as few chars as possible before hitting a '['.
    - \[: \ escapes special meta-characters, which in this case, is [. If we didn't do that, [ would do something very weird instead.
    - (.*): Parenthesis 'groups' whatever is inside it and you can later retrieve the groups by their numeric IDs or names (if they're given one).
    - \].*: You should know enough by now to know what this means.

    Implementation

    First, import the re module -- it's not a built-in -- to where-ever you want to use the expression.

    Then, use re.search(regex_pattern, string_to_be_tested) to search for the pattern in the string to be tested. This will return a MatchObject which you can store to a temporary variable. You should then call it's group() method and pass 1 as an argument (to see the 'Group 1' we captured using parenthesis earlier). I should now look like:

    >>> import re
    >>> pat = r'.*?\[(.*)].*'             #See Note at the bottom of the answer
    >>> s = "foobar['infoNeededHere']ffffd"
    >>> match = re.search(pat, s)
    >>> match.group(1)
    "'infoNeededHere'"
    

    An Alternative

    You can also use findall() to find all the non-overlapping matches by modifying the regex to (?>=\[).+?(?=\]).
    - (?<=\[): (?<=) is called a look-behind assertion and checks for an expression preceding the actual match.
    - .+?: + is just like * except that it matches one or more repititions. It is made non-greedy by ?.
    - (?=\]): (?=) is a look-ahead assertion and checks for an expression following the match w/o capturing it.
    Your code should now look like:

    >>> import re
    >>> pat = r'(?<=\[).+?(?=\])'  #See Note at the bottom of the answer
    >>> s = "foobar['infoNeededHere']ffffd[andHere] [andOverHereToo[]"
    >>> re.findall(pat, s)
    ["'infoNeededHere'", 'andHere', 'andOverHereToo['] 
    

    Note: Always use raw Python strings by adding an 'r' before the string (E.g.: r'blah blah blah').

    10x for reading! I wrote this answer when there were no accepted ones yet, but by the time I finished it, 2 ore came up and one got accepted. :( x<

    0 讨论(0)
提交回复
热议问题