问题

I hope someone is able to help me with what is, at least to me, quite a tricky algorithm.

The Problem

I have a List (1 <= size <= 5, but size unknown until run-time) of Lists (1 <= size <= 2) that I need to combine. Here is an example of what I am looking at:-

ListOfLists = { {1}, {2,3}, {2,3}, {4}, {2,3} }

So, there are 2 stages to what I need to do:-

(1). I need to combine the inner lists in such a way that any combination has exactly ONE item from each list, that is, the possible combinations in the result set here would be:-

1,2,2,4,2
1,2,2,4,3
1,2,3,4,2
1,2,3,4,3
1,3,2,4,2
1,3,2,4,3
1,3,3,4,2
1,3,3,4,3

The Cartesian Product takes care of this, so stage 1 is done.....now, here comes the twist which I can't figure out - at least I can't figure out a LINQ way of doing it (I am still a LINQ noob).

(2). I now need to filter out any duplicate results from this Cartesian Product. A duplicate in this case constitutes any line in the result set with the same quantity of each distinct list element as another line, that is,

1,2,2,4,3 is the "same" as 1,3,2,4,2

because each distinct item within the first list occurs the same number of times in both lists (1 occurs once in each list, 2 appears twice in each list, ....

The final result set should therefore look like this...

1,2,2,4,2
1,2,2,4,3
--
1,2,3,4,3
--
--
--
1,3,3,4,3

Another example is the worst-case scenario (from a combination point of view) where the ListOfLists is {{2,3}, {2,3}, {2,3}, {2,3}, {2,3}}, i.e. a list containing inner lists of the maximum size - in this case there would obviously be 32 results in the Cartesian Product result-set, but the pruned result-set that I am trying to get at would just be:-

2,2,2,2,2
2,2,2,2,3 <-- all other results with four 2's and one 3 (in any order) are suppressed
2,2,2,3,3 <-- all other results with three 2's and two 3's are suppressed, etc
2,2,3,3,3
2,3,3,3,3
3,3,3,3,3

To any mathematically-minded folks out there - I hope you can help. I have actually got a working solution to part 2, but it is a total hack and is computationally-intensive, and I am looking for guidance in finding a more elegant, and efficient LINQ solution to the issue of pruning.

Thanks for reading.

pip

Some resources used so far (to get the Cartesian Product)

computing-a-cartesian-product-with-linq
c-permutation-of-an-array-of-arraylists
msdn

UPDATE - The Solution

Apologies for not posting this sooner...see below

回答1:

You should implement your own IEqualityComparer<IEnumerable<int>> and then use that in Distinct().

The choice of hash code in the IEqualityComparer depends on your actual data, but I think something like this should be adequate if your actual data resemble those in your examples:

class UnorderedQeuenceComparer : IEqualityComparer<IEnumerable<int>>
{
    public bool Equals(IEnumerable<int> x, IEnumerable<int> y)
    {
        return x.OrderBy(i => i).SequenceEqual(y.OrderBy(i => i));
    }

    public int GetHashCode(IEnumerable<int> obj)
    {
        return obj.Sum(i => i * i);
    }
}

The important part is that GetHashCode() should be O(N), sorting would be too slow.

回答2:

void Main()
{
    var query =     from a in new int[] { 1 }
                    from b in new int[] { 2, 3 }
                    from c in new int[] { 2, 3 }
                    from d in new int[] { 4 }                   
                    from e in new int[] { 2, 3 }
                    select new int[] { a, b, c, d, e }; 
    query.Distinct(new ArrayComparer());
        //.Dump();
}
 public class ArrayComparer : IEqualityComparer<int[]>
    {
        public bool Equals(int[] x, int[] y)
        {            
            if (x == null || y == null)
                return false;

            return x.OrderBy(i => i).SequenceEqual<int>(y.OrderBy(i => i));

        }

        public int GetHashCode(int[] obj)
        {
            if ( obj == null || obj.Length == 0)
                return 0;
            var hashcode = obj[0];
            for (int i = 1; i < obj.Length; i++)
            {
                hashcode ^= obj[i];
            }
            return hashcode;
        }
    }

回答3:

The finalised solution to the whole combining of multisets, then pruning the result-sets to remove duplicates problem ended up in a helper class as a static method. It takes svick's much appreciated answer and injects the IEqualityComparer dependency into the existing CartesianProduct answer I found at Eric Lipperts's blog here (I'd recommend reading his post as it explains the iterations in his thinking and why the linq implimentation is the best).

static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences,
                                                       IEqualityComparer<IEnumerable<T>> sequenceComparer)
{
    IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
    var resultsSet = sequences.Aggregate(emptyProduct, (accumulator, sequence) => from accseq in accumulator
                                                                                  from item in sequence
                                                                                  select accseq.Concat(new[] { item }));

    if (sequenceComparer != null)
        return resultsSet.Distinct(sequenceComparer);
    else
        return resultsSet;
}

来源：https://stackoverflow.com/questions/6948019/linq-implementation-of-cartesian-product-with-pruning

标签

linq