Apparently this problem comes up fairly often, after reading
Regular expression to detect semi-colon terminated C++ for & while loops
and thinking about
You don't make it clear exactly what the specification of your function is, but this behaviour seems wrong to me:
>>> ParseNestedParen('(a)(b)(c)', 0)
['a)(b)(c']
>>> nested_paren.ParseNestedParen('(a)(b)(c)', 1)
['b']
>>> nested_paren.ParseNestedParen('(a)(b)(c)', 2)
['']
Other comments on your code:
fail
?re.findall
and then throwing away the result is wasteful.>>> ParseNestedParen(')' * 1000, 1) RuntimeError: maximum recursion depth exceeded while calling a Python object
As Thomi said in the question you linked to, "regular expressions really are the wrong tool for the job!"
The usual way to parse nested expressions is to use a stack, along these lines:
def parenthetic_contents(string):
"""Generate parenthesized contents in string as pairs (level, contents)."""
stack = []
for i, c in enumerate(string):
if c == '(':
stack.append(i)
elif c == ')' and stack:
start = stack.pop()
yield (len(stack), string[start + 1: i])
>>> list(parenthetic_contents('(a(b(c)(d)e)(f)g)'))
[(2, 'c'), (2, 'd'), (1, 'b(c)(d)e'), (1, 'f'), (0, 'a(b(c)(d)e)(f)g')]