问题
Consider a Pandas data frame:
import pandas as pd
df = pd.DataFrame({
'a': pd.Series([1,1,1,2,3]),
'b': pd.Series(list('asdfg'))
})
I want to return all of the rows with duplicate values for column a
, including the first or last row. I can do this with
df[df['a'].duplicated() | df['a'].duplicated(take_last=True)]
Is there a better way?
回答1:
You can count
occurrences of a
and return values>1
for duplicated rows.
In [25]: df[(df.groupby('a').transform('count')>1).values]
Out[25]:
a b
0 1 a
1 1 s
2 1 d
来源:https://stackoverflow.com/questions/30808703/is-there-a-better-way-to-find-duplicate-rows-including-the-first-last