I have a list of pairs:
[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]
and I want to remove any duplicates where
[a,b] == [
You could use the builtin filter
function.
from __future__ import print_function
def my_filter(l):
seen = set()
def not_seen(it):
s = min(*it), max(*it)
if s in seen:
return False
else:
seen.add(s)
return True
out = filter(not_seen, l)
return out
myList = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
print(my_filter(myList)) # [[0, 1], [0, 4], [1, 4]]
As a complement I would orient you to the Python itertools module which describes a unique_everseen
function which does basically the same thing as above but in a lazy, generator-based, memory-efficient version. Might be better than any of our solutions if you are working on large arrays. Here is how to use it:
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
gen = unique_everseen(myList, lambda x: (min(x), max(x))) # gen is an iterator
print(gen) #
result = list(gen) # consume generator into a list.
print(result) # [[0, 1], [0, 4], [1, 4]]
I haven't done any metrics to see who's fastest. However memory-efficiency and O complexity seem better in this version.
The builtin sorted
function could be passed to unique_everseen
to order items in the inner vectors. Instead, I pass lambda x: (min(x), max(x))
. Since I know the vector size which is exactly 2, I can proceed like this.
To use sorted
I would need to pass lambda x: tuple(sorted(x))
which adds overhead. Not dramatically, but still.
myList = [[random.randint(0, 10), random.randint(0,10)] for _ in range(10000)]
timeit.timeit("list(unique_everseen(myList, lambda x: (min(x), max(x))))", globals=globals(), number=20000)
>>> 156.81979029000013
timeit.timeit("list(unique_everseen(myList, lambda x: tuple(sorted(x))))", globals=globals(), number=20000)
>>> 168.8286430349999
Timings done in Python 3, which adds the globals
kwarg to timeit.timeit
.