How to find duplicate names using pandas?

帅比萌擦擦* 提交于 2020-01-11 17:40:12

问题


I have a pandas.DataFrame with a column called name containing strings. I would like to get a list of the names which occur more than once in the column. How do I do that?

I tried:

funcs_groups = funcs.groupby(funcs.name)
funcs_groups[(funcs_groups.count().name>1)]

But it doesn't filter out the singleton names.


回答1:


If you want to find the rows with duplicated name (except the first time we see that), you can try this

In [16]: import pandas as pd
In [17]: p1 = {'name': 'willy', 'age': 10}
In [18]: p2 = {'name': 'willy', 'age': 11}
In [19]: p3 = {'name': 'zoe', 'age': 10}
In [20]: df = pd.DataFrame([p1, p2, p3])

In [21]: df
Out[21]: 
   age   name
0   10  willy
1   11  willy
2   10    zoe

In [22]: df.duplicated('name')
Out[22]: 
0    False
1     True
2    False



回答2:


A one liner can be:

x.set_index('name').index.get_duplicates()

the index contains a method for finding duplicates, columns does not seem to have a similar method..




回答3:


value_counts will give you the number of duplicates as well.

names = df.name.value_counts()
names[names > 1]



回答4:


Another one liner can be:

(df.name).drop_duplicates()



回答5:


I had a similar problem and came across this answer.

I guess this also works:

counts = df.groupby('name').size()
df2 = pd.DataFrame(counts, columns = ['size'])
df2 = df2[df2.size>1]

and df2.index will give you a list of names with duplicates




回答6:


Most of the responses given demonstrate how to remove the duplicates, not find them.

The following will select each row in the data frame with a duplicate 'name' field. Note that this will find each instance, not just duplicates after the first occurrence. The keep argument accepts additional values that can exclude either the first or last occurrence.

df[df.duplicated(['name'], keep=False)]

The pandas reference for duplicated() can be found here.



来源:https://stackoverflow.com/questions/15247628/how-to-find-duplicate-names-using-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!