How do I find largest valid sequence of parentheses and brackets in a string?

问题

So I have a script I need to write and one of the largest problems boils down to finding the largest valid subsequence within a string. So I have something like

"()(({}[](][{[()]}]{})))("

as an input and I would need to return

"[{[()]}]{}"

as an output.

I have tried using a stack like structure like you would do if it was just parentheses but haven't been able to figure out something that works. I'd prefer a solution in python but any guidance anyone can offer will help regardless of language. The efficiency should ideally be better than n^2 since I can think of an O(n^2) solution using this How to find validity of a string of parentheses, curly brackets and square brackets? and just trying it on different substrings

回答1:

This can be solved using dynamic programming. Go through the array recording the longest valid match ending from each index. If you've got the longest match for index i, then it's easy to find the longest match for index i+1: skip backwards the longest match for index i, and then see if the characters surrounding that are matching open/close brackets. Then add the longest match to the left of that too, if any.

Here's some Python code that computes this:

def longest_valid(s):
    match = [0] * (len(s) + 1)
    for i in xrange(1, len(s)):
        if s[i] in '({[':
            continue
        open = '({['[')}]'.index(s[i])]
        start = i - 1 - match[i - 1]
        if start < 0: continue
        if s[start] != open: continue
        match[i] = i - start + 1 + match[start - 1]
    best = max(match)
    end = match.index(best)
    return s[end + 1 - best:end + 1]

print longest_valid("()(({}[](][{[()]}]{})))(")
print longest_valid("()(({}[]([{[()]}]{})))(")
print longest_valid("{}[()()()()()()()]")

It's O(n) in time and space.

回答2:

This answer uses the following input sequence as an example. The expected output is all of the string except the last (.

Input:  ()(({}[]([{[()]}]{})))(
Output: ()(({}[]([{[()]}]{})))

Step 1 is to find the seeds in the string. A seed is a matched set of symbols: (), [], or {}. I've given each seed a numerical value to assist the reader in visualizing the seeds.

()(({}[]([{[()]}]{})))(
11  2233    44   55

Step 2 is to expand the seeds into sequences. A sequences is a nested set of symbols: e.g. [{[()]}]. So starting from a seed, work outwards, verifying that the enclosing symbols are matched. The search ends at a mismatch, or at the beginning or end of the string. In the example, only seed 4 is enclosing by matching symbols, so only seed 4 is expanded.

()(({}[]([{[()]}]{})))(
11  2233 4444444455

Step 3 is to combine adjacent sequences. Note that there can be two or more adjacent sequences, but in the example there are two adjacent sequences in two places

()(({}[]([{[()]}]{})))(
11  2222 4444444444

Repeat step 2, treating the combined sequences as seeds. In this example, sequence 4 is enclosed by matching parentheses, so sequence 4 is expanded.

()(({}[]([{[()]}]{})))(
11  2222444444444444

Repeat step 3, combine sequences

()(({}[]([{[()]}]{})))(
11  2222222222222222

Repeat step 2, expand

()(({}[]([{[()]}]{})))(
1122222222222222222222

And combine one more time

()(({}[]([{[()]}]{})))(
1111111111111111111111

The algorithm ends when there's nothing left to expand, or combine. The longest sequence is the answer.

Implementation notes:

I think that you can achieve O(n) by expanding/merging one sequence at a time. I would keep the list of sequences in a doubly-linked list (so removal is an O(1) operation). Each sequence would be represented by a start index, and an end index.

Expanding a sequence involves checking the symbols at array[start-1] and array[end+1], and then updating the start/end indexes as appropriate.

Merging involves checking the next and previous sequences in the linked list. If the sequences can be merged, then one sequence is updated to cover the full range, and the other is deleted.

Once an sequence is expanded/merged as much as possible, move to the next sequence in the list. As this new sequence is expanded/merged, it may eventually work its way back to the previous sequence. Hence, after initially creating a doubly-linked list of seeds, one pass through the linked list should be sufficient to expand/merge all of the sequences. Then a second pass through whatever remains of the linked list is needed to find the longest sequence.

回答3:

If you're talking about arbitrary depth, Franks anser here may apply: Regular expression to detect semi-colon terminated C++ for & while loops

If we are talking finite depth, Regex could be your friend (you may want to check performance)

it seems that you're looking for:

literal square-bracket
a bunch of chars that aren't end bracket
close bracket
open brace
all chars up to the last close brace
close brace

so, language-agnostic something like:

\[[^]]*\{.*\}

this could be used with re.compile with Python, but really it could be any language. Since .* (any char) and [^]] (not-end square brace) are assumed, you can use w+ or d+ for word/digit or other Regex short-hand to refine the solution and speed things up.

回答4:

This is an old question but I though I'd contribute an O(n) approach that does a single pass through the characters and tracks the matches using a stack. It rolls up lenghts to the previous embedding group when consecutive balanced groups are found.

from collections import deque
def balanced(s):
    groups = {"(":")", "[":"]", "{":"}"}
    result = ""
    starts = deque([["",0,0]])              # stack of [closingChar,position,width]
    for i,c in enumerate(s):
        if c in groups:
            starts.append([groups[c],i,1])  # stack opening groups
        elif c != starts[-1][0]:
            starts = [["",i+1,0]]           # unmatched open/close, clear stack
        else:
            _,p,w   = starts.pop()                     # close group
            if not starts: starts.append(["",p,0])     # maintain ungrouped baseline
            starts[-1][2] = w = starts[-1][2] + w + 1  # roll up group size
            if w-w%2>len(result):                      # track longest
                result = s[starts[-1][1]+w%2:][:w-w%2] # w%2 handles grouped/ungrouped
    return result

output:

balanced("()(({}[](][{[()]}]{})))(") # [{[()]}]{}

balanced("()(({}[]([{[()]}]{})))(")  # ()(({}[]([{[()]}]{})))

balanced("{}[()()()()()()()]")       # {}[()()()()()()()]

balanced("{[([](){}})]")             # [](){}

来源：https://stackoverflow.com/questions/38840902/how-do-i-find-largest-valid-sequence-of-parentheses-and-brackets-in-a-string

标签

python

algorithm

stack

dynamic-programming

parentheses