How to get an expression between balanced parentheses

耗尽温柔 提交于 2019-11-27 04:48:37

问题


Suppose I am given the following kind of string:

"(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"

and I want to extract substrings contained within a topmost layer of parentheses. I.e. I want to obtain the strings:"this is (haha) a string(()and it's sneaky)" and "lorem".

Is there a nice pythonic method to do this? Regular expressions are not obviously up to this task, but maybe there is a way to get an xml parser to do the job? For my application I can assume the parentheses are well formed, i.e. not something like (()(().


回答1:


This is a standard use case for a stack: You read the string character-wise and whenever you encounter an opening parenthesis, you push the symbol to the stack; if you encounter a closing parenthesis, you pop the symbol from the stack.

Since you only have a single type of parentheses, you don’t actually need a stack; instead, it’s enough to just remember how many open parentheses there are.

In addition, in order to extract the texts, we also remember where a part starts when a parenthesis on the first level opens and collect the resulting string when we encounter the matching closing parenthesis.

This could look like this:

string = "(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"

stack = 0
startIndex = None
results = []

for i, c in enumerate(string):
    if c == '(':
        if stack == 0:
            startIndex = i + 1 # string to extract starts one index later

        # push to stack
        stack += 1
    elif c == ')':
        # pop stack
        stack -= 1

        if stack == 0:
            results.append(string[startIndex:i])

print(results)
# ["this is (haha) a string(()and it's sneaky)", 'lorem']



回答2:


Are you sure regex isn't good enough?

>>> x=re.compile(r'\((?:(?:\(.*?\))|(?:[^\(\)]*?))\)')
>>> x.findall("(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla")
["(this is (haha) a string(()and it's sneaky)", '(lorem)']
>>> x.findall("((((this is (haha) a string((a(s)d)and ((it's sneaky))))))) ipsom (lorem) bla")
["((((this is (haha) a string((a(s)d)and ((it's sneaky))", '(lorem)']



回答3:


this isnt very "pythonic"...but

def find_strings_inside(what_open,what_close,s):
    stack = []
    msg = []
    for c in s:
        s1=""
        if c == what_open:
           stack.append(c)
           if len(stack) == 1:
               continue
        elif c == what_close and stack:
           stack.pop()
           if not stack:
              yield "".join(msg)
              msg[:] = []
        if stack:
            msg.append(c)

x= list(find_strings_inside("(",")","(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"))

print x



回答4:


This more or less repeats what's already been said, but might be a bit easier to read:

def extract(string):
    flag = 0
    result, accum = [], []
    for c in string:
        if c == ')':
            flag -= 1
        if flag:
            accum.append(c)
        if c == '(':
            flag += 1
        if not flag and accum:
            result.append(''.join(accum))
            accum = []
    return result

>> print extract(test)
["this is (haha) a string(()and it's sneaky)", 'lorem']


来源:https://stackoverflow.com/questions/38211773/how-to-get-an-expression-between-balanced-parentheses

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!