Python List: Is this the best way to remove duplicates while preserving order? [duplicate]

问题

Possible Duplicates:
How do you remove duplicates from a list in Python whilst preserving order?
Algorithm - How to delete duplicate elements in a list efficiently?

I've read a lot of methods for removing duplicates from a python list while preserving the order. All the methods appear to require the creation of a function/sub-routine, which I think is not very computationally efficient. I came up with the following and I would like to know if this is the most computationally efficient method to do so? (My usage for this has to be the most efficient possible due to the need to have fast response time.) Thanks

b=[x for i,x in enumerate(a) if i==a.index(x)]

回答1:

a.index(x) itself will be O(n) as the list has to be searched for the value x. The overall runtime is O(n^2).

"Saving" function calls does not make a bad algorithm faster than a good one.

More efficient (O(n)) would probably be:

result = []
seen = set()
for i in a:
    if i not in seen:
        result.append(i)
        seen.add(i)

Have a look at this question: How do you remove duplicates from a list in whilst preserving order?

(the top answer also shows how to do this in a list comprehension manner, which will be more efficient than an explicit loop)

You can easily profile your code yourself using the timeit [docs] module. For example, I put your code in func1 and mine in func2. If I repeat this 1000 times with an array with 1000 elements (no duplicates):

>>> a = range(1000)
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
11.691882133483887
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.3130321502685547

Now with duplicates (only 100 distinct values):

>>> a = [random.randint(0, 99) for _ in range(1000)]
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
2.5020430088043213
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.08332705497741699

回答2:

lst = [1, 3, 45, 8, 8, 8, 9, 10, 1, 2, 3]
dummySet = set()
[(i, dummySet.add(i))[0] for i in lst if i not in dummySet]

来源：https://stackoverflow.com/questions/7232220/python-list-is-this-the-best-way-to-remove-duplicates-while-preserving-order

标签

python

list

duplicates