问题
If I have a list like this one:
mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]
What is best way to remove duplicate sub-lists?
Now I use this:
y, s = [ ], set( )
for t in mylist:
w = tuple( sorted( t ) )
if not w in s:
y.append( t )
s.add( w )
It works, but I wonder if there is better way? Something more python-like?
回答1:
Convert the elements to a tuple*, then convert it the whole thing to a set, then convert everything back to a list:
m = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]
print [list(i) for i in set(map(tuple, m))]
*we are converting to tuples because lists are non-hashable (and therefore we cannot use set on them
回答2:
You can use OrderedDict.fromkeys to filter duplicates out of the list while still preserving order:
>>> from collections import OrderedDict
>>> mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]
>>> map(list, OrderedDict.fromkeys(map(tuple, mylist)))
[[1, 2, 3], ['a', 'c'], [3, 4, 5], [1, 2]]
>>>
The map(tuple, mylist)
is necessary because dictionary keys must be hashable (lists are not since you can add/remove items from them).
回答3:
Well, since set
s inherently dedupe things, your first instinct might be to do set(mylist)
. However, that doesn't quite work:
In [1]: mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]
In [2]: set(mylist)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-b352bcae5975> in <module>()
----> 1 set(mylist)
TypeError: unhashable type: 'list'
This is because set
s only work on iterable
s of hashable elements (and since list
s are mutable, they are not hashable).
Instead, you can do this simply for the price of converting your sublists to subtuples:
In [3]: set([tuple(x) for x in mylist])
Out[3]: {(1, 2), (1, 2, 3), (3, 4, 5), ('a', 'c')}
Or, if you really need a list of lists again:
In [4]: [list(x) for x in set([tuple(x) for x in mylist])]
Out[4]: [[1, 2], [3, 4, 5], ['a', 'c'], [1, 2, 3]]
回答4:
Because you have sorted(t)
in your question, I assume you consider [1,2]
to be a duplicate of [2,1]
If this is true, I'd use frozenset for the inside lists (which are hashable) and will not care about the ordering of the sublists.
So something like:
set(frozenset(sublist) for sublist in mylist)
回答5:
You don't need to sort, the sort in the code you copied is sorting for a different reason:
seen,out = set(), []
for ele in mylist:
tp = tuple(ele)
if tp not in seen:
out.append(ele)
seen.add(tp)
回答6:
Well this will work for your case:
mylist2 = set(map(tuple, mylist))
print(mylist2) # ('a', 'c'), (3, 4, 5), (1, 2), (1, 2, 3)}
This works, because it changes your sublists to tuples, which in your case are hashable. So set can take them and make a unique.
And in case you really want the output to be a list of lists, you can do this:
print(list(map(list,mylist2))) # [['a', 'c'], [3, 4, 5], [1, 2], [1, 2, 3]]
回答7:
If order and structure (list of lists) don't matter, you can use
set(map(tuple, my_list))
if they do matter, you can use a list comprehension
[e for i,e in enumerate(my_list) if e not in my_list[:i]]
which keeps only the first duplicate of every element, thus keeping only one of each. It is marginally slower
In [16]: timeit.timeit('[e for i,e in enumerate(my_list) if e not in my_list[:i]]', setup="my_list = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]")
Out[16]: 1.9146944019994407
In [17]: timeit.timeit('set(map(tuple, my_list))', setup="my_list = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]")
Out[17]: 1.3857673469974543
but if you care about speed you should probably try a loopey approach.
来源:https://stackoverflow.com/questions/28755053/remove-duplicate-sublists-from-a-list