DataFrame.drop_duplicates and DataFrame.drop not removing rows

后端 未结 2 1589
悲哀的现实
悲哀的现实 2020-12-09 20:50

I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the data

2条回答
  •  爱一瞬间的悲伤
    2020-12-09 20:50

    In my case the issue was that I was concatenating dfs with columns of different types:

    import pandas as pd
    
    s1 = pd.DataFrame([['a', 1]], columns=['letter', 'code'])
    s2 = pd.DataFrame([['a', '1']], columns=['letter', 'code'])
    df = pd.concat([s1, s2])
    df = df.reset_index(drop=True)
    df.drop_duplicates(inplace=True)
    
    # 2 rows
    print(df)
    
    # int
    print(type(df.at[0, 'code']))
    # string
    print(type(df.at[1, 'code']))
    
    # Fix:
    df['code'] = df['code'].astype(str)
    df.drop_duplicates(inplace=True)
    
    # 1 row
    print(df)
    

提交回复
热议问题