I have a list of pairs:
[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]
and I want to remove any duplicates where
[a,b] == [
An easy and unnested solution:
pairs = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
s=set()
for p in pairs:
# Lists are unhashable so make the "elements" into tuples
p = tuple(p)
if p not in s and p[::-1] not in s:
s.add(p)
print s
Well, I am "checking for the reverse pair and append to a list if that's not the case" as you said you could do, but I'm using a single loop.
x=[[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
out = []
for pair in x:
if pair[::-1] not in out:
out.append(pair)
print out
The advantage over existing answers is being, IMO, more readable. No deep knowledge of the standard library is needed here. And no keeping track of anything complex. The only concept that might be unfamiliar for beginners it that [::-1]
reverts the pair.
The performance is O(n**2) though, so do not use if performance is an issue and/or lists are big.
You could sort each pair, convert your list of pairs to a set of tuples and back again :
l = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
[list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in l]))]
#=> [[0, 1], [1, 4], [0, 4]]
The steps might be easier to understand than a long one-liner :
>>> l = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
>>> [sorted(pair) for pair in l]
# [[0, 1], [0, 4], [0, 1], [1, 4], [0, 4], [1, 4]]
>>> [tuple(pair) for pair in _]
# [(0, 1), (0, 4), (0, 1), (1, 4), (0, 4), (1, 4)]
>>> set(_)
# set([(0, 1), (1, 4), (0, 4)])
>>> list(_)
# [(0, 1), (1, 4), (0, 4)]
>>> [list(tpl) for tpl in _]
# [[0, 1], [1, 4], [0, 4]]
set(map(frozenset, lst))
If the pairs are logically unordered, they're more naturally expressed as sets. It would be better to have them as sets before you even get to this point, but you can convert them like this:
lst = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
lst_as_sets = map(frozenset, lst)
And then the natural way of eliminating duplicates in an iterable is to convert it to a set
:
deduped = set(lst_as_sets)
(This is the main reason I chose frozenset
in the first step. Mutable set
s are not hashable, so they can't be added to a set
.)
Or you can do it in a single line like in the TL;DR section.
I think this is much simpler, more intuitive, and more closely matches how you think about the data than fussing with sorting and tuples.
If for some reason you really need a list
of list
s as the final result, converting back is trivial:
result_list = list(map(list, deduped))
But it's probably more logical to leave it all as set
s as long as possible. I can only think of one reason that you might need this, and that's compatibility with existing code/libraries.
First get each list sorted and next use the dictionaries keys to get a unique set of elements and them list comprehension.
Why tuples?
Replacing lists with tuples is necessary to avoid the "unhashable" error when passing through the fromkeys() function
my_list = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
tuple_list = [ tuple(sorted(item)) for item in my_list ]
final_list = [ list(item) for item in list({}.fromkeys(tuple_list)) ]
Using OrderedDict even preserve the list order.
from collections import OrderedDict
my_list = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
tuple_list = [ tuple(sorted(item)) for item in my_list ]
final_list = [ list(item) for item in list(OrderedDict.fromkeys(tuple_list)) ]
The above code will result in the desired list
[[0, 1], [0, 4], [1, 4]]
If the order of pairs and pair-items matters, creating a new list by testing for membership might be the way to go here.
pairs = [0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]
no_dups = []
for pair in pairs:
if not any( all( i in p for i in pair ) for p in no_dups ):
no_dups.append(pair)
Otherwise, I'd go with Styvane's answer.
Incidentally, the above solution will not work for cases in which you have matching pairs. For example, [0,0]
would not be added to the list. For that, you'd need to add an additional check:
for pair in pairs:
if not any( all( i in p for i in pair ) for p in no_dups ) or ( len(set(pair)) == 1 and not pair in no_dups ):
no_dups.append(pair)
However, that solution will not pick up empty "pairs" (eg, []
). For that, you'll need one more adjustment:
if not any( all( i in p for i in pair ) for p in no_dups ) or ( len(set(pair)) in (0,1) and not pair in no_dups ):
no_dups.append(pair)
The and not pair in no_dups
bit is required to prevent adding the [0,0]
or []
to no_dups
twice.