Python List: Is this the best way to remove duplicates while preserving order? [duplicate]

∥☆過路亽.° 提交于 2019-12-08 01:38:49

问题


Possible Duplicates:
How do you remove duplicates from a list in Python whilst preserving order?
Algorithm - How to delete duplicate elements in a list efficiently?

I've read a lot of methods for removing duplicates from a python list while preserving the order. All the methods appear to require the creation of a function/sub-routine, which I think is not very computationally efficient. I came up with the following and I would like to know if this is the most computationally efficient method to do so? (My usage for this has to be the most efficient possible due to the need to have fast response time.) Thanks

b=[x for i,x in enumerate(a) if i==a.index(x)]

回答1:


a.index(x) itself will be O(n) as the list has to be searched for the value x. The overall runtime is O(n^2).

"Saving" function calls does not make a bad algorithm faster than a good one.

More efficient (O(n)) would probably be:

result = []
seen = set()
for i in a:
    if i not in seen:
        result.append(i)
        seen.add(i)

Have a look at this question: How do you remove duplicates from a list in whilst preserving order?

(the top answer also shows how to do this in a list comprehension manner, which will be more efficient than an explicit loop)


You can easily profile your code yourself using the timeit [docs] module. For example, I put your code in func1 and mine in func2. If I repeat this 1000 times with an array with 1000 elements (no duplicates):

>>> a = range(1000)
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
11.691882133483887
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.3130321502685547

Now with duplicates (only 100 distinct values):

>>> a = [random.randint(0, 99) for _ in range(1000)]
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
2.5020430088043213
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.08332705497741699



回答2:


lst = [1, 3, 45, 8, 8, 8, 9, 10, 1, 2, 3]
dummySet = set()
[(i, dummySet.add(i))[0] for i in lst if i not in dummySet]


来源:https://stackoverflow.com/questions/7232220/python-list-is-this-the-best-way-to-remove-duplicates-while-preserving-order

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!