System.OutOfMemoryException when generating permutations

后端 未结 4 1975
野的像风
野的像风 2020-12-06 09:55

I\'m getting System.OutOfMemoryException when trying to generate 6 letter permutations. 5 letter permutations still work.

Here is the code I\'m using to

相关标签:
4条回答
  • You're running out of memory from storing all of those permutations at one point.

    Assuming a length of 5 chars, there are 7,893,600 different permutations.
    Assuming a length of 6 chars, there are 165,765,600 different permutations.

    Considering that each character in a string is worth 2 bytes of memory, you would need 1,989,187,200 bytes (Just around 2 Gigabytes) to store all permutations. That's not exactly desirable.

    So how do we fix this?

    I've never coded in c#, but here's a practical design decision: perform individual processing when the permutation is created itself. That way, you only need to store the permutations you need. Here's some pseudo-code:

    List<string> storedPermutations;
    string s = createPermutation();
    bool shouldAdd = shouldAddPermutation(s);
    if (bool) 
    {
        storedPermutations.add(s);
    }
    

    That's arguably not the best (nor is it probably pseudo-code), but the logic here is to decide whether to add the permutation to the list the moment it is created, instead of adding everything to a list, and then trying to process that entire list. If you still run out of memory, then there's still a lot of permutations.

    0 讨论(0)
  • 2020-12-06 10:22

    Here's a simple solution that is both computationally and memory efficient.

    • Instead of generating the entire list of permutations and then find matches, using an iterator lets us process potential permutation matches as they get generated.
    • With a little bit of backtracking, only permutations that have a chance to match your regex get generated.

    All you need is an extra regular expression which accepts partial candidates. It should accept strings that could become a match if characters get added. (It would have be nice to have something like hitEnd() in Java which does exactly this. This would eliminate the need for that regular expression. Unfortunately, I do not think there is an equivalent in .Net)

    In my example, I want to find permutations of the string "123456789" that match regex "32145.67". I use the (suboptimal) regular expression "^3$|^32$|^321" to discard permutations that do not start with 321. (Of course, here it would have been possible to generate the permutations for "456789" and prepend "123" to the results, but this is just to illustrate the concept.)

    The efficiency of this solution will rely mostly on how many invalid matches you can discard early in the generation of permutations.

    Short explanation of how the permutation generation works. Let's try to generate all the permutations of the string "abc". It can easily be seen that:

    permutations("abc") = {"a" + permutations("bc"),
                           "b" + permutations("ac"),
                           "c" + permutations("ab")}
    

    In other words, we take each character of the input string, append it to an accumulator and compute all the permutations for the input string with the character removed. Once we reach a leaf - a permutation of the input string -, the accumulator will have the size of the input string.

    This can be written succinctly in recursive pseudo-code as:

    permutation(input, acc)
      if input empty
         return acc
    
      foreach(character in input)
          left <- remove char from input
          permutation(left, acc+char)
    

    Now this is not the most efficient way to generate permutations. (see Heap's algorithm) But at least it allows us not to explore the entire tree structure and discard permutations just by looking at their prefix.

    Since "yield return" does not work so well in recursive functions, I have simply rewritten the solution in a iterative manner (Note: space complexity is worse than with the above recursive DFS).

    public IEnumerable<string> getPermutation(string input, string regexp)
    {
            Stack<string> left = new Stack<string>();
            Stack<string> acc = new Stack<string>();
    
            left.Push(input);
            acc.Push("");
    
            // generate all permutations that match regexp
            while (left.Count > 0)
            {
                string c = left.Pop();
                string r = acc.Pop();
    
                if(r.Length==input.Length)
                {
                    yield return r;
                }
                else
                {
                    for(int i=0;i<c.Length;i++)
                    {
                        string p = r + c[i];
                        if (Regex.IsMatch(p,regexp)) // continue if we have a potential match
                        {
                            left.Push(c.Substring(0, i) + c.Substring(i + 1));
                            acc.Push(p);
                        }
                    }
                }
    
            }            
    }
    
    
    
    foreach(var a in getPermutation("123456789", "^3$|^32$|^321"))
    {
        if(Regex.IsMatch(a, "32145.67"))
        {
             // found match
        }
    
    }
    
    0 讨论(0)
  • 2020-12-06 10:30

    The best thing to do here is to use the lazy initialization to avoid having all the permutations in memory at the same time.

    private static IEnumerable<string> getPermutations(int n,string source)
    {
        IEnumerable<string> q = source.Select(x => x.ToString());
        for (int i = 0; i < n - 1; i++)
        {
            q = q.SelectMany(x => source, (x, y) => x + y);
        }
    
        return q; 
    }
    
    private static List<string> filterListByRegex(IEnumerable<string> list, string regex)
    {
        List<string> newList = new List();
        foreach(var item in list)
        {
            Match match = Regex.Match(item, regex, RegexOptions.IgnoreCase);
            if (match.Success)
            {
                newList.Add(item);
            }
        }
    
        return newList;
    }
    

    This may not be the most efficient way to do it, but at least it should get you past the memory issues.

    0 讨论(0)
  • 2020-12-06 10:37

    First, I would like to mention that what we are discussing here is not real permutations, but so called n-tuples or permutations with repetition - Wikipedia.

    Second, regarding the System.OutOfMemoryException when generating permutations, I think we all agree that the result should not be stored in a list, but provided as enumerable which will allow applying filtering and consuming (eventually storing) only the ones in interest.

    In that regard the LINQ solution provided by @juharr is performing very well. So my goals are to minimize the intermediary memory allocations, including string concatenations and also to end up with a more general and faster solution.

    In order to do that, I need to take some hard design decision. The signature of the general function I'm talking about will look like this

    public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
    

    and the question is what should be the array yielded. If we follow the recomendations, they should be a separate array instances. However, remember I want to minimize allocations, I've decided to break that rules and yield one and the same array instance, moving the responsibility of not modifying it and cloning it if necessary to the caller. For instance, this allows the caller to perform no cost filtering. Or implement the OP function on top on it like this

    public static IEnumerable<string> RepeatingPermutations(this string set, int N)
    {
        return set.ToCharArray().RepeatingPermutations(N).Select(p => new string(p));
    }
    

    A few words about the algorithm. Instead of looking at the problem recursively as some other answerers, I want to effectively implement the equivalent of something like this

    from e1 in set
    from e2 in set
    ...
    from eN in set
    select new [] { e1, e2, .., eN }
    

    Interestingly, I recently answered a combinations related question and realized that the algorithms are pretty much the same.

    With all that being said, here is the function:

    public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
    {
        var result = new T[N];
        var indices = new int[N];
        for (int pos = 0, index = 0; ;)
        {
            for (; pos < N; pos++, index = 0)
            {
                indices[pos] = index;
                result[pos] = set[index];
            }
            yield return result;
            do
            {
                if (pos == 0) yield break;
                index = indices[--pos] + 1;
            }
            while (index >= set.Length);
        }
    }
    

    I've did some tests by simply calling the different functions with N=2,3,..6 and simply iterating and counting. Here are the results on my machine:

    A : N=2 Count=         676 Time=00:00:00.0000467 Memory=     29K
    B1: N=2 Count=         676 Time=00:00:00.0000263 Memory=     16K
    B2: N=2 Count=         676 Time=00:00:00.0000189 Memory=      8K
    
    A : N=3 Count=      17,576 Time=00:00:00.0010107 Memory=    657K
    B1: N=3 Count=      17,576 Time=00:00:00.0003673 Memory=    344K
    B2: N=3 Count=      17,576 Time=00:00:00.0001415 Memory=      8K
    
    A : N=4 Count=     456,976 Time=00:00:00.0184445 Memory=  2,472K
    B1: N=4 Count=     456,976 Time=00:00:00.0096189 Memory=  2,520K
    B2: N=4 Count=     456,976 Time=00:00:00.0033624 Memory=      8K
    
    A : N=5 Count=  11,881,376 Time=00:00:00.4281349 Memory=    397K
    B1: N=5 Count=  11,881,376 Time=00:00:00.2482835 Memory=  4,042K
    B2: N=5 Count=  11,881,376 Time=00:00:00.0887759 Memory=      8K
    
    A : N=6 Count= 308,915,776 Time=00:00:11.2697326 Memory=  1,688K
    B1: N=6 Count= 308,915,776 Time=00:00:06.5638404 Memory=  1,024K
    B2: N=6 Count= 308,915,776 Time=00:00:02.2674431 Memory=      8K
    

    where

    A - LINQ function from @juharr
    B1 - my function with string
    B2 - my function with char[]

    As we can see, memory wise both string functions are comparable. Performance wise the LINQ function is only ~2 times slower, which is pretty good result.

    As expected in such scenario, the non allocating function significantly outperforms them both.

    UPDATE: As requested in the comments, here is the sample usage of the above functions (note that they are extension methods and must be placed in a static class of your choice):

    var charSet = Enumerable.Range('A', 'Z' - 'A' + 1).Select(c => (char)c).ToArray();
    var charPermutations = charSet.RepeatingPermutations(3);
    var stringSet = new string(charset);
    var stringPermutations = stringSet.RepeatingPermutations(3);
    

    However, remember the design choice I've made, so if you expand the charPermutations inside the debugger, you'll see one and the same values (the last permutation). Consuming the whole result of the above call for char[] should be like this

    var charPermutationList = charSet.RepeatingPermutations(3)
        .Select(p => (char[])p.Clone()).ToList();
    

    Actually a good addition to the two methods presented would be the following extension method:

    public static IEnumerable<T[]> Clone<T>(this IEnumerable<T[]> source)
    {
        return source.Select(item => (T[])item.Clone());
    }
    

    so the consuming call would be simple

    var charPermutationList = charSet.RepeatingPermutations(3).Clone().ToList();
    
    0 讨论(0)
提交回复
热议问题