Remove duplicate sublists from a list

问题

If I have a list like this one:

mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]

What is best way to remove duplicate sub-lists?

Now I use this:

y, s = [ ], set( )
for t in mylist:
    w = tuple( sorted( t ) )
    if not w in s:
        y.append( t )
        s.add( w )

It works, but I wonder if there is better way? Something more python-like?

回答1:

Convert the elements to a tuple*, then convert it the whole thing to a set, then convert everything back to a list:

m = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]

print [list(i) for i in set(map(tuple, m))]

*we are converting to tuples because lists are non-hashable (and therefore we cannot use set on them

回答2:

You can use OrderedDict.fromkeys to filter duplicates out of the list while still preserving order:

>>> from collections import OrderedDict
>>> mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]
>>> map(list, OrderedDict.fromkeys(map(tuple, mylist)))
[[1, 2, 3], ['a', 'c'], [3, 4, 5], [1, 2]]
>>>

The map(tuple, mylist) is necessary because dictionary keys must be hashable (lists are not since you can add/remove items from them).

回答3:

Well, since sets inherently dedupe things, your first instinct might be to do set(mylist). However, that doesn't quite work:

In [1]: mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]

In [2]: set(mylist)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-b352bcae5975> in <module>()
----> 1 set(mylist)

TypeError: unhashable type: 'list'

This is because sets only work on iterables of hashable elements (and since lists are mutable, they are not hashable).

Instead, you can do this simply for the price of converting your sublists to subtuples:

In [3]: set([tuple(x) for x in mylist])
Out[3]: {(1, 2), (1, 2, 3), (3, 4, 5), ('a', 'c')}

Or, if you really need a list of lists again:

In [4]: [list(x) for x in set([tuple(x) for x in mylist])]
Out[4]: [[1, 2], [3, 4, 5], ['a', 'c'], [1, 2, 3]]

回答4:

Because you have sorted(t) in your question, I assume you consider [1,2] to be a duplicate of [2,1]

If this is true, I'd use frozenset for the inside lists (which are hashable) and will not care about the ordering of the sublists.

So something like:

set(frozenset(sublist) for sublist in mylist)

回答5:

You don't need to sort, the sort in the code you copied is sorting for a different reason:

seen,out = set(), []

for ele in mylist:
    tp = tuple(ele)
    if tp not in seen:
        out.append(ele)
    seen.add(tp)

回答6:

Well this will work for your case:

mylist2 = set(map(tuple, mylist))
print(mylist2) # ('a', 'c'), (3, 4, 5), (1, 2), (1, 2, 3)}

This works, because it changes your sublists to tuples, which in your case are hashable. So set can take them and make a unique.

And in case you really want the output to be a list of lists, you can do this:

print(list(map(list,mylist2))) # [['a', 'c'], [3, 4, 5], [1, 2], [1, 2, 3]]

回答7:

If order and structure (list of lists) don't matter, you can use

set(map(tuple, my_list))

if they do matter, you can use a list comprehension

[e for i,e in enumerate(my_list) if e not in my_list[:i]]

which keeps only the first duplicate of every element, thus keeping only one of each. It is marginally slower

In [16]: timeit.timeit('[e for i,e in enumerate(my_list) if e not in my_list[:i]]', setup="my_list = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]")
Out[16]: 1.9146944019994407

In [17]: timeit.timeit('set(map(tuple, my_list))', setup="my_list = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]")
Out[17]: 1.3857673469974543

but if you care about speed you should probably try a loopey approach.

来源：https://stackoverflow.com/questions/28755053/remove-duplicate-sublists-from-a-list

标签

python

list

duplicates

duplicate-removal