How to select duplicate rows with pandas?

空扰寡人 提交于 2021-02-04 10:59:23

问题


I have a dataframe like this:

import pandas as pd
dic = {'A':[100,200,250,300],
       'B':['ci','ci','po','pa'],
       'C':['s','t','p','w']}
df = pd.DataFrame(dic)

My goal is to separate the row in 2 dataframes:

  • df1 = contains all the rows that do not repeat values along column B (unque rows).
  • df2 = containts only the rows who repeat themeselves.

The result should look like this:

df1 =      A  B C         df2 =     A  B C
      0  250 po p               0  100 ci s 
      1  300 pa w               1  250 ci t

Note:

  • the dataframes could be in general very big and have many values that repeat in column B so the answer should be as generic as possible
    • if there are no duplicates, df2 should be empty! all the results should be in df1

回答1:


You can use Series.duplicated with parameter keep=False to create a mask for all duplicates and then boolean indexing, ~ to invert the mask:

mask = df.B.duplicated(keep=False)
print (mask)
0     True
1     True
2    False
3    False
Name: B, dtype: bool

print (df[mask])
     A   B  C
0  100  ci  s
1  200  ci  t

print (df[~mask])
     A   B  C
2  250  po  p
3  300  pa  w


来源:https://stackoverflow.com/questions/41042996/how-to-select-duplicate-rows-with-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!