Is there a better way to find duplicate rows _including_ the first/last?

你。 提交于 2019-12-11 02:44:53

问题


Consider a Pandas data frame:

import pandas as pd

df = pd.DataFrame({
    'a': pd.Series([1,1,1,2,3]),
    'b': pd.Series(list('asdfg'))
})

I want to return all of the rows with duplicate values for column a, including the first or last row. I can do this with

df[df['a'].duplicated() | df['a'].duplicated(take_last=True)]

Is there a better way?


回答1:


You can count occurrences of a and return values>1 for duplicated rows.

In [25]: df[(df.groupby('a').transform('count')>1).values]
Out[25]:
   a  b
0  1  a
1  1  s
2  1  d


来源:https://stackoverflow.com/questions/30808703/is-there-a-better-way-to-find-duplicate-rows-including-the-first-last

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!