Pandas: Efficient way to check if a value in column A is in a list of values in column B

问题

my initial dataframe looks like this

 A   | B
-----------------
 'a' | ['1', 'a', 'b']        
 '1' | ['2', '5', '6']   
 'd' | ['a', 'b', 'd']        
 'y' | ['x', '1', 'y']

and I want to check if 'a' is in the corresponding list in B: ['1', 'a', 'b']

I could do that by using the apply

df.apply(lambda row: row[['A']][0] in row[['B']][0], axis=1)

that gives me the expected result:

[True, False, True, True]

but on the real data I have (millions of rows) that is very heavy and takes ages. Is there a more efficient way to do the same thing? for example using numpy elementwise operations or anything else?

回答1:

If you convert each column to sets, you can use < to compare pairwise subsets

a = d.A.apply(lambda x: set([x]))
b = d.B.apply(set)

a < b

0     True
1    False
2     True
3     True
dtype: bool

Otherwise, you can use a list comprehension with zip

[a in b for a, b in zip(d.A.values.tolist(), d.B.values.tolist())]

[True, False, True, True]

timing small data

timing large data

来源：https://stackoverflow.com/questions/43553523/pandas-efficient-way-to-check-if-a-value-in-column-a-is-in-a-list-of-values-in

标签

python

list

pandas

contains

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!