Most pythonic way to remove tuples from a list if first element is a duplicate

旧时模样 提交于 2019-12-11 04:28:18

问题


The code I have so far is pretty ugly:

orig = [(1,2),(1,3),(2,3),(3,3)]
previous_elem = []
unique_tuples = []
for tuple in orig:
    if tuple[0] not in previous_elem:
        unique_tuples += [tuple]
    previous_elem += [tuple[0]]
assert unique_tuples == [(1,2),(2,3),(3,3)]

There must be a more pythonic solution.


回答1:


If you don't care which tuple round you return for duplicates, you could always convert your list to a dictionary and back:

>>> orig = [(1,2),(1,3),(2,3),(3,3)]
>>> list(dict(orig).items())
[(1, 3), (2, 3), (3, 3)]

If you want to return the first tuple round, you could reverse your list twice and use an OrderedDict, like this:

>>> from collections import OrderedDict
>>> orig = [(1,2),(1,3),(2,3),(3,3)]
>>> new = list(OrderedDict(orig[::-1]).items())[::-1]
[(1, 2), (2, 3), (3, 3)]

These are not the most efficient solutions (if that's of great importance) , but they do make for nice idiomatic one-liners.


Some benchmarking

Note the difference in speed, and that should you not care which tuple round you return, the first option is much more efficient:

>>> import timeit
>>> setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
'''
>>> print (min(timeit.Timer('(list(dict(orig).items()))', setup=setup).repeat(7, 1000)))
0.0015771419037069459

compared to

>>>setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
from collections import OrderedDict
'''
>>> print (min(timeit.Timer('(list(OrderedDict(orig[::-1]).items())[::-1])', 
             setup=setup).repeat(7, 1000)))
0.024554947372323

The first option is nearly 15 times faster according to these speed tests.

That being said however, Saksham's answer is also O(n) and smashes these dictionary methods efficiency wise:

>>> setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
seen = set()
def fun():
    for (a, b) in orig:
        if not a in seen:
            newlist.append((a, b))
            seen.add(a)
    return newlist
'''
>>> print (min(timeit.Timer('fun()', setup=setup).repeat(7, 1000)))
0.0004833390384996095



回答2:


If you want the first containing a specific key to always be the one that appears in the final list:

list(reversed(collections.OrderedDict( reversed([(1,2),(1,3),(2,3),(3,3)])).items()))

Which results in:

 [(1, 2), (2, 3), (3, 3)]



回答3:


If you do not wish to store in an additional data structure, the time complexity O(n^2), as pointed out in the comments:

orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
for (a, b) in orig:
    if not any(x == a for x, y in newlist):
        newlist.append((a, b))
print newlist    # prints [(1, 2), (2, 3), (3, 3)]

A little book-keeping can reduce it to linear time:

orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
seen = set()
for (a, b) in orig:
    if not a in seen:
        newlist.append((a, b))
        seen.add(a)
print newlist    # prints [(1, 2), (2, 3), (3, 3)]


来源:https://stackoverflow.com/questions/29563953/most-pythonic-way-to-remove-tuples-from-a-list-if-first-element-is-a-duplicate

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!