Find the intersection between sublists

问题

Recently i encounter with a question about Find the intersection between sublists . that tells the sublist which have any (1 or more ) intersection together become one. for example the following list :

l=[[1,2,3],[0,13,6],[9,10],[3,4,5],[10,11],[6,7,50]]

must be converted to :

[[1, 2, 3, 4, 5],[0, 50, 6, 7, 13],[9, 10, 11]]

So i wrote the following function to do it that works well with a good performance, i use set for its fast complexity for checking membership and also in inner loop i use slicing that compare the first index of main list with other elements in every loop and also note that the list will been decrease after each loop ,as its a recursion inside the loop . :

s=[set(i) for i in g if i]

def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              s[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list

s=[set(i) for i in l if i]
print find_intersection(s)
[set([1, 2, 3, 4, 5]), set([0, 50, 6, 7, 13]), set([9, 10, 11])]

But i think it could be done with another solution maybe with better performance , i thought about collections.deque or maybe with numpy or just modifying my function and make it better ? . if you have any suggestion i would be grateful to hear about !

回答1:

Here is a more efficient algorithm:

For each unique number that is present in at least one of the sublists, let's maintain a list of indices of all sublists that contain this number. This part is O(n * log n) time if we use sorting to find unique numbers or O(n) if we use a hash table, where n is the total number of elements in all sublists.
Let's create a graph where vertices are sublist indices and an edge is present if two indices appear together in at least one list of indices among all numbers. We need create at most O(n) edges(this part is slightly non-trivial: there is no need to create all edges explicitly, we can just add an edge from an element to the next one in each sublist for all unique elements due to transitivity). Here is some pseudo code:
```
g = empty graph
for elem in unique_elements:
    sublist_indices = list of indices of all sublists that contain this element
    for i = 1 ... size(sublist_indices - 1):
        g.add_edge(sublist_indices[i], sublist_indices[i + 1])
```
Now we can find connected components in this graph using depth-first search in linear time(this graph is undirected).
We know which sublists should be merged(they should be merged if and only if they are in the same connected component), so we can easily construct the answer.

The total time complexity is O(n). It is optimal because reading the input already requires O(n) operations.

回答2:

l=[[1,2,3],[0,13,6],[9,10],[3,4,5],[10,11],[6,7,50]]

temp = []

result = []

for i in range(len(l)):

    for j in range(i + 1, len(l)):
        if set(l[i]).intersection(l[j]):
            temp.append(l[i] + l[j])
            result.append(list(set(temp[i])))
print result

来源：https://stackoverflow.com/questions/27838603/find-the-intersection-between-sublists

标签

python

list

numpy

data-structures

intersection