Finding the intersection between two series in Pandas

戏子无情 提交于 2019-11-26 22:45:49

问题


I have two series s1 and s2 in pandas/python and want to compute the intersection i.e. where all of the values of the series are common.

How would I use the concat function to do this? I have been trying to work it out but have been unable to (I don't want to compute the intersection on the indices of s1 and S2, but on the values).

Thanks in advance.


回答1:


Place both series in Python's set container then use the set intersection method:

s1.intersection(s2)

and then transform back to list if needed.

Just noticed pandas in the tag. Can translate back to that:

pd.Series(list(set(s1).intersection(set(s2))))

From comments I have changed this to a more Pythonic expression, which is shorter and easier to read:

Series(list(set(s1) & set(s2)))

should do the trick, except if the index data is also important to you.

Have added the list(...) to translate the set before going to pd.Series as pandas does not accept a set as direct input for a Series.




回答2:


Setup:

s1 = pd.Series([4,5,6,20,42])
s2 = pd.Series([1,2,3,5,42])

Timings:

%%timeit
pd.Series(list(set(s1).intersection(set(s2))))
10000 loops, best of 3: 57.7 µs per loop

%%timeit
pd.Series(np.intersect1d(s1,s2))
1000 loops, best of 3: 659 µs per loop

%%timeit
pd.Series(np.intersect1d(s1.values,s2.values))
10000 loops, best of 3: 64.7 µs per loop

So the numpy solution can be comparable to the set solution even for small series, if one uses the values explicitely.




回答3:


If you are using Pandas, I assume you are also using NumPy. Numpy has a function intersect1d that will work with a Pandas series.

Example:

pd.Series(np.intersect1d(pd.Series([1,2,3,5,42]), pd.Series([4,5,6,20,42])))

will return a Series with the values 5 and 42.




回答4:


Python

s1 = pd.Series([4,5,6,20,42])
s2 = pd.Series([1,2,3,5,42])

s1[s1.isin(s2)]

R

s1  <- c(4,5,6,20,42)
s2 <- c(1,2,3,5,42)

s1[s1 %in% s2]

Edit: Doesn't handle dupes.




回答5:


Could use merge operator like follows

pd.merge(df1, df2, how='inner')


来源:https://stackoverflow.com/questions/18079563/finding-the-intersection-between-two-series-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!