Python - Get all unique combinations with replacement from lists of list with unequal length

久未见 提交于 2021-02-07 08:52:57

问题


Note : This is not a duplicate question as the title might say

If I have a list of list , I need to get all combinations from it with replacement.

import itertools

l = [[1,2,3] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]
[2, 2, 2]
[2, 2, 3]
[2, 3, 3]
[3, 3, 3]

Thanks to @RoadRunner and @Idlehands.

Above code is perfect with 2 problems :

  1. For large list, itertools.product throws MemoryError. When l has 18 3-length sublists to give ~400mil combn.

  2. Order matters and thus sorted would not work for my problem. This could be confusing for some and hence explaining with below example.

    l = [[1,2,3], [1], [1,2,3]]

Here I have 2 unique groups :

Group1 : elements 0, 2 which has same value [1,2,3]

Group 2 : element 1 which has value [1]

Thus, the solutions I need is :

[1,1,1]
[1,1,2]
[1,1,3]
[2,1,2]
[2,1,3]
[3,1,3]

Thus location 1 was fixed to 1.

Hope this example helps.


回答1:


Edited Answer:

Based on the new information, in order to handle a plethora of combination overloading the itertools.product(), we can try to pull the list in small batches:

from itertools import product
l = [list(range(3))]*18
prods = product(*l)
uniques = set()
results = []
totals = 0

def run_batch(n=1000000):
    for i in range(n):
        try:
            result = next(prods)
        except StopIteration:
            break
        unique = tuple(sorted(result))
        if unique not in uniques:
            uniques.add(unique)
            results.append(result)
    global totals
    totals += i

run_batch()
print('Total iteration this batch: {0}'.format(totals))
print('Number of unique tuples: {0}'.format(len(uniques)))
print('Number of wanted combos: {0}'.format(len(results)))

Output:

Total iteration this batch: 999999
Number of unique tuples: 103
Number of wanted combos: 103
First 10 results:
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2)

Here we can control the batch size by calling next(prod) with the range of your choice, and continue as you see fit. The uniques are sorted tuples in a set as a reference point, and the results are in the proper order you wanted. Both size should be the same and are surprisingly small when I ran with the list of 3^18. I'm not well acquainted with memory allocation but this way the program shouldn't store all the unwanted results in memory, so you should therefore have more wiggle room. Otherwise, you can always opt to export the results to a file to make room. Obviously this sample only show the length of the list, but you can easily display/save that for your own purpose.

I can't argue this is the best approach or most optimized, but It seems to work for me. Maybe it'll work for you? This batch took approximately ~10s to run 5 times (avg ~2s each batch). The entire set of prods took me 15 minutes to run:

Total iteration: 387420102
Number of unique tuples: 190
Number of wanted combos: 190

Original Answer:

@RoadRunner had a neat solution with sort() and defaultdict, but I feel the latter was not needed. I leveraged his sort() suggestion and implemented a modified version here.

From this answer:

l = [[1] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

Output:

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]



回答2:


What about grouping sequences with the same elements in different order with a collections.defaultdict, then picking the first element from each key:

from itertools import product
from collections import defaultdict

l = [[1] ,[1,2,3],  [1,2,3]]

d = defaultdict(list)
for x in product(*l):
    d[tuple(sorted(x))].append(x)

print([x[0] for x in d.values()])

Which gives:

[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

Alternatively, this can also be done with keeping a set of what has been added:

from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = tuple(sorted(x))
    if curr not in seen:
        combs.append(x)
        seen.add(curr)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

If you don't want to sort, consider using a frozenset with collections.Counter():

from collections import Counter
from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = frozenset(Counter(x).items())

    if curr not in seen:
        seen.add(curr)
        combs.append(x)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

Note: You can also use setdefault() for the first approach, if you don't want to use a defaultdict().




回答3:


For short input sequences, this can be done by filtering the output of itertools.product to just the unique values. One not optimized way is set(tuple(sorted(t)) for t in itertools.product(*l)), converting to a list if you like.

If you have enough of a Cartesian product fanout that this is too inefficient, and if your input example showing the sublists as sorted is something you can rely on, you could borrow a note from the docs' discussion of permutations and filter out non-sorted values:

The code for permutations() can be also expressed as a subsequence of product(), filtered to exclude entries with repeated elements (those from the same position in the input pool)

So you'd want a quick test for whether a value is sorted or not, something like this answer: https://stackoverflow.com/a/3755410/2337736

And then list(t for t in itertools.product(*l) if is_sorted(t))

Beyond that, I think you'd have to get into recursion or a fixed length of l.



来源:https://stackoverflow.com/questions/48375270/python-get-all-unique-combinations-with-replacement-from-lists-of-list-with-un

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!