python: comparing 2 lists of instances

三世轮回 提交于 2020-01-03 03:18:45

问题


I have 2 lists of instances:

list1
list2

each instance contains variables such as id, name, etc...

I am iterating through list2, and I want to find entries that don't exist in list1.

eg..

for entry in list2:
  if entry.id in list1:
    <do something> 

I'm hoping to find a way to do this without a douple for loop. Is there an easy way?


回答1:


I might do something like:

set1 = set((x.id,x.name,...) for x in list1)
difference = [ x for x in list2 if (x.id,x.name,...) not in set1 ]

where ... is additional (hashable) attibutes of the instance -- You need to include enough of them to make it unique.

This takes your O(N*M) algorithm and turns it into an O(max(N,M)) algorithm.




回答2:


Just a thought...

class Foo(object):
    def __init__(self, id, name):
        self.id = id
        self.name = name
    def __repr__(self):
        return '({},{})'.format(self.id, self.name)

list1 = [Foo(1,'a'),Foo(1,'b'),Foo(2,'b'),Foo(3,'c'),]
list2 = [Foo(1,'a'),Foo(2,'c'),Foo(2,'b'),Foo(4,'c'),]

So ordinarily this does not work:

print(set(list1)-set(list2))
# set([(1,b), (2,b), (3,c), (1,a)])

But you could teach Foo what it means for two instances to be equal:

def __hash__(self):
    return hash((self.id, self.name))

def __eq__(self, other):
    try:
        return (self.id, self.name) == (other.id, other.name)
    except AttributeError:
        return NotImplemented

Foo.__hash__ = __hash__
Foo.__eq__ = __eq__

And now:

print(set(list1)-set(list2))
# set([(3,c), (1,b)])

Of course, it is more likely that you can define __hash__ and __eq__ on Foo at class-definition time, instead of needing to monkey-patch it later:

class Foo(object):
    def __init__(self, id, name):
        self.id = id
        self.name = name

    def __repr__(self):
        return '({},{})'.format(self.id, self.name)

    def __hash__(self):
        return hash((self.id, self.name))

    def __eq__(self, other):
        try:
            return (self.id, self.name) == (other.id, other.name)
        except AttributeError:
            return NotImplemented

And just to satisfy my own curiosity, here is a benchmark:

In [34]: list1 = [Foo(1,'a'),Foo(1,'b'),Foo(2,'b'),Foo(3,'c')]*10000

In [35]: list2 = [Foo(1,'a'),Foo(2,'c'),Foo(2,'b'),Foo(4,'c')]*10000
In [40]: %timeit set1 = set((x.id,x.name) for x in list1); [x for x in list2 if (x.id,x.name) not in set1 ]
100 loops, best of 3: 15.3 ms per loop

In [41]: %timeit set1 = set(list1); [x for x in list2 if x not in set1]
10 loops, best of 3: 33.2 ms per loop

So @mgilson's method is faster, though defining __hash__ and __eq__ in Foo leads to more readable code.




回答3:


You can use filter

difference = filter(lambda x: x not in list1, list2)

In Python 2 it will return the list you want. In Python 3 it will return anfilter object, which you might want to convert to a list.




回答4:


Something like this perhaps?

In [1]: list1 = [1,2,3,4,5]

In [2]: list2 = [4,5,6,7]

In [3]: final_list = [x for x in list1 if x not in list2]


来源:https://stackoverflow.com/questions/14721062/python-comparing-2-lists-of-instances

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!