Pythonic and efficient way to find all the different intersections between two partitions of the same set

元气小坏坏 提交于 2021-01-29 15:20:33

问题


I need to find all the different intersections between two partitions of the same set. For example, if we have the following two partitions of the same set

x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]

the required result is

[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].

In detail, we calculate the cartesian product between every subset of x and y, and for each of these products, we classify the elements in new subsets accordingly if they belong to the intersection of their associated subsets or not.

What is the optimal / more pythonic way to do it? Thanks in advance!


PERFORMANCE COMPARISON OF THE CURRENT ANSWERS:

import numpy as np

def partitioning(alist, indices):
    return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]

total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))

a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))

def partition_decomposition_product_1(x, y):
    out = []
    for sublist1 in x:
        d = {}
        for val in sublist1:
            for i, sublist2 in enumerate(y):
                if val in sublist2:
                    d.setdefault(i, []).append(val)
        out.extend(d.values())
    return out

def partition_decomposition_product_2(x, y):
    all_s = []
    for sx in x:
        for sy in y:
            ss = list(filter(lambda x:x in sx, sy))
            if ss:
                all_s.append(ss)
    return all_s

def partition_decomposition_product_3(x, y):
    return [np.intersect1d(i,j) for i in x for j in y]

And measuring execution time with %timeit

%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)

we find

2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

thus the second solution is the fastest one.


回答1:


The fact that the two lists are partitions of the same set is not relevant to the algorithm choice. This boils down to iterating through two lists of lists and getting the intersection between each combination (you can add that assertion at the beginning of the function to ensure they are partitions of the same set, using this answer to flatten the lists efficiently). With this in mind, this function accomplishes the task, using this answer to calculate list intersection:

def func2(x, y):
    # check that they partition the same set 
    checkx = sorted([item for sublist in x for item in sublist])
    checky = sorted([item for sublist in y for item in sublist])
    assert checkx == checky

    # get all intersections
    all_s = []
    for sx in x:
        for sy in y:
            ss = list(filter(lambda x:x in sx, sy))
            if ss:
                all_s.append(ss)
    return all_s

Then using this time comparison method, we can see that this new function is ~100x faster than your original implementation.




回答2:


I'm not sure if I understand you correctly, but this script produces the result you have in your question:

x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]

out = []
for sublist1 in x:
    d = {}
    for val in sublist1:
        for i, sublist2 in enumerate(y):
            if val in sublist2:
                d.setdefault(i, []).append(val)
    out.extend(d.values())

print(out)

Prints:

[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]]



回答3:


I may miss some details, but it seems a bit too easy:

[np.intersect1d(a,b) for a in x for b in y]

Output:

[array([1]),
 array([2]),
 array([3]),
 array([4, 5]),
 array([6, 7]),
 array([ 8,  9, 10])]

The above includes duplicates, for example x=[[1,2,3],[1,4,5]] and y=[[1,6,7]] would gives [[1],[1]].


If you want to find the unique intersections:

[list(i) for i in {tuple(np.intersect1d(a,b)) for a in x for b in y}]

Output:

[[8, 9, 10], [6, 7], [1], [4, 5], [2], [3]]


来源:https://stackoverflow.com/questions/61647198/pythonic-and-efficient-way-to-find-all-the-different-intersections-between-two-p

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!