Filtering lists

末鹿安然 提交于 2019-12-12 12:22:36

问题


I want to filter repeated elements in my list for instance

foo = ['a','b','c','a','b','d','a','d']

I am only interested with:

['a','b','c','d']

What would be the efficient way to do achieve this ? Cheers


回答1:


Cast foo to a set, if you don't care about element order.




回答2:


list(set(foo)) if you are using Python 2.5 or greater, but that doesn't maintain order.




回答3:


Since there isn't an order-preserving answer with a list comprehension, I propose the following:

>>> temp = set()
>>> [c for c in foo if c not in temp and (temp.add(c) or True)]
['a', 'b', 'c', 'd']

which could also be written as

>>> temp = set()
>>> filter(lambda c: c not in temp and (temp.add(c) or True), foo)
['a', 'b', 'c', 'd']

Depending on how many elements are in foo, you might have faster results through repeated hash lookups instead of repeated iterative searches through a temporary list.

c not in temp verifies that temp does not have an item c; and the or True part forces c to be emitted to the output list when the item is added to the set.




回答4:


>>> bar = []
>>> for i in foo:
    if i not in bar:
        bar.append(i)

>>> bar
['a', 'b', 'c', 'd']

this would be the most straightforward way of removing duplicates from the list and preserving the order as much as possible (even though "order" here is inherently wrong concept).




回答5:


If you care about order a readable way is the following

def filter_unique(a_list):
    characters = set()
    result = []
    for c in a_list:
        if not c in characters:
            characters.add(c)
            result.append(c)
    return result

Depending on your requirements of speed, maintanability, space consumption, you could find the above unfitting. In that case, specify your requirements and we can try to do better :-)




回答6:


If you write a function to do this i would use a generator, it just wants to be used in this case.

def unique(iterable):
    yielded = set()
    for item in iterable:
        if item not in yielded:
            yield item
            yielded.add(item)



回答7:


Inspired by Francesco's answer, rather than making our own filter()-type function, let's make the builtin do some work for us:

def unique(a, s=set()):
    if a not in s:
        s.add(a)
        return True
    return False

Usage:

uniq = filter(unique, orig)

This may or may not perform faster or slower than an answer that implements all of the work in pure Python. Benchmark and see. Of course, this only works once, but it demonstrates the concept. The ideal solution is, of course, to use a class:

class Unique(set):
    def __call__(self, a):
        if a not in self:
            self.add(a)
            return True
        return False

Now we can use it as much as we want:

uniq = filter(Unique(), orig)

Once again, we may (or may not) have thrown performance out the window - the gains of using a built-in function may be offset by the overhead of a class. I just though it was an interesting idea.




回答8:


This is what you want if you need a sorted list at the end:

>>> foo = ['a','b','c','a','b','d','a','d']
>>> bar = sorted(set(foo))
>>> bar
['a', 'b', 'c', 'd']



回答9:


import numpy as np
np.unique(foo)



回答10:


You could do a sort of ugly list comprehension hack.

[l[i] for i in range(len(l)) if l.index(l[i]) == i]


来源:https://stackoverflow.com/questions/1596390/filtering-lists

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!