Pythonic way of removing reversed duplicates in list

后端 未结 9 2374
难免孤独
难免孤独 2020-12-06 09:49

I have a list of pairs:

[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]

and I want to remove any duplicates where

[a,b] == [         


        
9条回答
  •  無奈伤痛
    2020-12-06 10:30

    You could use the builtin filter function.

    from __future__ import print_function
    
    def my_filter(l):
        seen = set()
    
        def not_seen(it):
            s = min(*it), max(*it)
            if s in seen:
                return False
            else:
                seen.add(s)
                return True
            
        out = filter(not_seen, l)
    
        return out
    
    myList = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
    print(my_filter(myList)) # [[0, 1], [0, 4], [1, 4]]
    

    As a complement I would orient you to the Python itertools module which describes a unique_everseen function which does basically the same thing as above but in a lazy, generator-based, memory-efficient version. Might be better than any of our solutions if you are working on large arrays. Here is how to use it:

    from itertools import ifilterfalse
    
    def unique_everseen(iterable, key=None):
        "List unique elements, preserving order. Remember all elements ever seen."
        # unique_everseen('AAAABBBCCDAABBB') --> A B C D
        # unique_everseen('ABBCcAD', str.lower) --> A B C D
        seen = set()
        seen_add = seen.add
        if key is None:
            for element in ifilterfalse(seen.__contains__, iterable):
                seen_add(element)
                yield element
        else:
            for element in iterable:
                k = key(element)
                if k not in seen:
                    seen_add(k)
                    yield element
    
    gen = unique_everseen(myList, lambda x: (min(x), max(x))) # gen is an iterator
    print(gen) # 
    result = list(gen) # consume generator into a list.
    print(result) # [[0, 1], [0, 4], [1, 4]]
    

    I haven't done any metrics to see who's fastest. However memory-efficiency and O complexity seem better in this version.

    Timing min/max vs sorted

    The builtin sorted function could be passed to unique_everseen to order items in the inner vectors. Instead, I pass lambda x: (min(x), max(x)). Since I know the vector size which is exactly 2, I can proceed like this.

    To use sorted I would need to pass lambda x: tuple(sorted(x)) which adds overhead. Not dramatically, but still.

    myList = [[random.randint(0, 10), random.randint(0,10)] for _ in range(10000)]
    timeit.timeit("list(unique_everseen(myList, lambda x: (min(x), max(x))))", globals=globals(), number=20000)
    >>> 156.81979029000013
    timeit.timeit("list(unique_everseen(myList, lambda x: tuple(sorted(x))))", globals=globals(), number=20000)
    >>> 168.8286430349999
    

    Timings done in Python 3, which adds the globals kwarg to timeit.timeit.

提交回复
热议问题