Recurrence Approach : How can we generate all possibilities on braces?

问题

How can we generate all possibilities on braces ?

N value has given to us and we have to generate all possibilities.

Examples:

1) if N == 1, then only one possibility () .

2) if N==2, then possibilities are (()), ()()

3) if N==3, then possibilities are ((())), (())(),()()(), ()(()) ...

Note: left and right braces should match. I mean )( is INVALID for the N==1

Can we solve this problem by using recurrence approach ?

回答1:

From wikipedia -

A Dyck word is a string consisting of n X's and n Y's such that no initial segment of the string has more Y's than X's (see also Dyck language). For example, the following are the Dyck words of length 6:

XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.

Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, Cn counts the number of expressions containing n pairs of parentheses which are correctly matched:

((()))     ()(())     ()()()     (())()     (()())

See also http://www.acta.sapientia.ro/acta-info/C1-1/info1-9.pdf

Abstract. A new algorithm to generate all Dyck words is presented, which is used in ranking and unranking Dyck words. We emphasize the importance of using Dyck words in encoding objects related to Catalan numbers. As a consequence of formulas used in the ranking algorithm we can obtain a recursive formula for the nth Catalan number.

回答2:

For a given N we always have to start with an open brace. Now consider where it's corresponding closing brace is. It can be in the middle like in ()() or at the end like (()) for N=2.

Now consider N=3:

It can be at the end: (()()) and ((())).

Or in the middle: ()(()) and ()()() where it's in position 2. Then it can also be in position 4: (())().

Now we can essentially combine the 2 cases by realising the case where the closing brace is at the end is the same as it being at the middle, but with all the possibilities for N=0 added to the end.

Now to solve it you can work out all the possibilities for n between the begin and end brace and similarly you can work out all the possibilities for m after the end brace. (Note m+n+1 = N) Then you can just combine all possible combinations, append them to your list of possibilities and move on to the next possible location for the end brace.

Just be warned an easy mistake to make with these types of problems, is to find all the possibilities for i and for N-i and just combine them, but this for N=3 would double count ()()() or at least print it twice.

Here is some Python 2.x code that solves the problem:

memoise = {}
memoise[0] = [""]
memoise[1] = ["()"]

def solve(N, doprint=True):
    if N in memoise:
        return memoise[N]

    memoise[N] = []

    for i in xrange(1,N+1):
        between = solve(i-1, False)
        after   = solve(N-i, False)
        for b in between:
           for a in after:
               memoise[N].append("("+b+")"+a)

    if doprint:
        for res in memoise[N]:
            print res

    return memoise[N]

回答3:

try to google for Catalan numbers

回答4:

Recursive solution:

import java.util.Scanner;

public class Parentheses
{

    static void ParCheck(int left,int right,String str)
    {
            if (left == 0 && right == 0)
            {
                    System.out.println(str);
            }

            if (left > 0)
            {
                    ParCheck(left-1, right+1 , str + "(");
            }
            if (right > 0)
            {
                    ParCheck(left, right-1, str + ")");
            }

    }
    public static void main(String[] args)
    {
            Scanner input=new Scanner(System.in);
            System.out.println("Enter the  number");
            int num=input.nextInt();

            String str="";
            ParCheck(num,0,str);
    }
}

回答5:

Here's some code that's essentially a compact version of JPvdMerwe's code, except that it returns the list of solutions rather than printing them. This code works on both Python 2 and Python 3.

from itertools import product

def solve(num, cache={0: ['']}):
    if num not in cache:
        cache[num] = ['(%s)%s' % t for i in range(1, num + 1)
            for t in product(solve(i - 1), solve(num - i))]
    return cache[num]

# Test

for num in range(1, 5):
    print(num)
    for s in solve(num):
        print(s)

output

1
()
2
()()
(())
3
()()()
()(())
(())()
(()())
((()))
4
()()()()
()()(())
()(())()
()(()())
()((()))
(())()()
(())(())
(()())()
((()))()
(()()())
(()(()))
((())())
((()()))
(((())))

Here are a couple more functions, derived from the pseudocode given in the article linked by Ed Guiness: Generating and ranking of Dyck words. That article uses 1-based indexing, but I've converted them to conform with Python's 0-based indexing.

These functions are slower than the solve function above, but they may still be useful. pos_dyck_words has the advantage that it's purely iterative. unrank is iterative, but it calls the recursive helper function f; OTOH, f uses caching, so it's not as slow as it could be, and it's only caching integers, which uses less RAM than the string caching of solve. The main benefit of unrank is that it can return an individual solution from its index number rather than having to produce all solutions of a given size.

This code is for Python 3 only. It's easy enough to convert it for Python 2 use, you just have to implement your own caching scheme instead of lru_cache. You do need caching, otherwise f is intolerably slow for all but the smallest Dyck word lengths.

from itertools import product
from functools import lru_cache

# Generate all Dyck words of length 2*num, recursively
# fastest, but not lexicographically ordered
def solve(num, cache = {0: ['']}):
    if num not in cache:
        cache[num] = ['0%s1%s' % t for i in range(1, num + 1)
            for t in product(solve(i - 1), solve(num - i))]
    return cache[num]

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# A helper function for `unrank`
# f(i, j) gives the number of paths between (0,0) and (i, j) not crossing
# the diagonal x == y of the grid. Paths consist of horizontal and vertical
# segments only, no diagonals are permitted

@lru_cache(None)
def f(i, j):
    if j == 0:
        return 1
    if j == 1:
        return i
    #if i < j:
        #return 0
    if i == j:
        return f(i, i - 1)
    # 1 < j < i <= n
    return f(i - 1, j) + f(i, j - 1)

# Determine the position array of a Dyck word from its rank number, 
# The position array gives the indices of the 1s in the word; 
# the rank number is the word's index in the lexicographic sequence
# of all Dyck words of length 2n 
# Very slow
def unrank(nr, n):
    b = [-1]
    for i in range(n):
        b.append(1 + max(b[-1], 2 * i))
        ni = n - i - 1
        for j in range(n + i - b[-1], 0, -1):
            delta = f(ni, j)
            if nr < delta or b[-1] >= n + i:
                break
            nr -= delta
            b[-1] += 1
    return b[1:]

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Generate all Dyck word position arrays for words of length 2*n, iteratively
def pos_dyck_words(n):
    b = list(range(1, 2 * n, 2))
    while True:
        yield b
        for i in range(n - 2, -1, -1):
            if b[i] < n + i:
                b[i] += 1
                for j in range(i + 1, n - 1):
                    b[j] = 1 + max(b[j - 1], 2 * j)
                break
        else:
            break

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Convert a position array to a Dyck word
def pos_to_word(b, n, chars='01'):
    c0, c1 = chars
    word = [c0] * (2 * n)
    for i in b:
        word[i] = c1
    return ''.join(word)

# Some tests

num = 4
print('num: {}, Catalan number: {}'.format(num, f(num, num)))

words = list(solve(num))
words.sort(reverse=True)
print(len(words))

for i, u in enumerate(pos_dyck_words(num)):
    v = unrank(i, num)
    w = words[i]
    ok = u == v and pos_to_word(u, num) == w
    print('{:2} {} {} {} {}'.format(i, u, v, w, ok))
print()

num = 10
print('num: {}, Catalan number: {}'.format(num, f(num, num)))

for i, u in enumerate(pos_dyck_words(num)):
    v = unrank(i, num)
    assert u == v, (i, u, v)
print('ok')

output

num: 4, Catalan number: 14
14
 0 [1, 3, 5, 7] [1, 3, 5, 7] 01010101 True
 1 [1, 3, 6, 7] [1, 3, 6, 7] 01010011 True
 2 [1, 4, 5, 7] [1, 4, 5, 7] 01001101 True
 3 [1, 4, 6, 7] [1, 4, 6, 7] 01001011 True
 4 [1, 5, 6, 7] [1, 5, 6, 7] 01000111 True
 5 [2, 3, 5, 7] [2, 3, 5, 7] 00110101 True
 6 [2, 3, 6, 7] [2, 3, 6, 7] 00110011 True
 7 [2, 4, 5, 7] [2, 4, 5, 7] 00101101 True
 8 [2, 4, 6, 7] [2, 4, 6, 7] 00101011 True
 9 [2, 5, 6, 7] [2, 5, 6, 7] 00100111 True
10 [3, 4, 5, 7] [3, 4, 5, 7] 00011101 True
11 [3, 4, 6, 7] [3, 4, 6, 7] 00011011 True
12 [3, 5, 6, 7] [3, 5, 6, 7] 00010111 True
13 [4, 5, 6, 7] [4, 5, 6, 7] 00001111 True

num: 10, Catalan number: 16796
ok

回答6:

I came up with the following algorithm which is not recursive as requested by the OP but it is worth mentioning given its invincible efficiency.

As said in Ed Guiness' post, strings of N pairs of correctly matched parentheses is a representation of a Dyck word. In another useful representation, parentheses ( and ) are replaced with 1 and 0 respectively. Hence, ()()() becomes 101010. The latter can also be seen as the binary representation of (decimal) number 42. In summary, some integer numbers can represent strings of correctly matched pairs of parentheses. Using this representation the following is an efficient algorithm to generate Dyck works.

Let integer be any C/C++ (or, possibly, and member of the the C-family programming languages) unsigned integer type up to 64-bits long. Given a Dyck word the following code returns the next Dyck word of the same size, provided it exists.

integer next_dyck_word(integer w) {
    integer const a = w & -w;
    integer const b = w + a;
    integer c = w ^ b;
    c = (c / a >> 2) + 1;
    c = ((c * c - 1) & 0xaaaaaaaaaaaaaaaa) | b;
    return c;
}

For instance, if w == 42 (101010 in binary, i.e., ()()()) the function returns 44 (101100, ()(())). One can iterate until getting 56 (111000, ((()))) which is the maximum Dyck word for N == 3.

Above, I've mentioned invincible efficiency because, as far as generation of a single Dyck word is concerned, this algorithm is O(1), loopless and branchless. However, the implementation still has room for improvement. Indeed, the relatively expensive division c / a in the body of the function can be eliminated if we can use some assembly instructions that are not available in strict Standard compliant C/C++.

You might say. "Objection! I do not want to be constraint to N <= 64". Well, my answer to that is that if you want to generate all Dyck works, then in practice you are already bound to a much lower size than 64. Indeed, the number of Dyck works of size N grows factorially with N and for N == 64 the time to generate them all is likely to be greater than the age of the universe. (I confess I did not calculate this time but this is a quite common anecdotal feature of problems of this nature.)

I've written a detailed document on the algorithm.

来源：https://stackoverflow.com/questions/4313921/recurrence-approach-how-can-we-generate-all-possibilities-on-braces

标签

algorithm

recursion

data-structures

catalan