Efficiently remove duplicates, order-agnostic, from list of lists

后端 未结 4 1000
感情败类
感情败类 2020-12-21 00:16

The following list has some duplicated sublists, with elements in different order:

l1 = [
    [\'The\', \'quick\', \'brown\', \'fox\'],
    [\'hi\', \'there\'         


        
4条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-21 00:41

    I did a quick benchmark, comparing the various answers:

    l1 = [['The', 'quick', 'brown', 'fox'], ['hi', 'there'], ['jumps', 'over', 'the', 'lazy', 'dog'], ['there', 'hi'], ['jumps', 'dog', 'over','lazy', 'the']]
    
    from collections import Counter
    
    def method1():
        """manually construct set, keyed on sorted tuple"""
        seen = set()
        result = []
        for x in l1:
            key = tuple(sorted(x))
            if key not in seen:
                result.append(x)
                seen.add(key)
        return result
    
    def method2():
        """frozenset-of-Counter"""
        return list({frozenset(Counter(lst).items()): lst for lst in reversed(l1)}.values())
    
    def method3():
        """wim"""
        return [*{tuple(sorted(k)): k for k in reversed(l1)}.values()][::-1]
    
    from timeit import timeit
    
    print(timeit(lambda: method1(), number=1000))
    print(timeit(lambda: method2(), number=1000))
    print(timeit(lambda: method3(), number=1000))
    

    Prints:

    0.0025010189856402576
    0.016385524009820074
    0.0026451340527273715
    

提交回复
热议问题