DataFrame.drop_duplicates and DataFrame.drop not removing rows

后端 未结 2 1587
悲哀的现实
悲哀的现实 2020-12-09 20:50

I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the data

相关标签:
2条回答
  • 2020-12-09 20:50

    In my case the issue was that I was concatenating dfs with columns of different types:

    import pandas as pd
    
    s1 = pd.DataFrame([['a', 1]], columns=['letter', 'code'])
    s2 = pd.DataFrame([['a', '1']], columns=['letter', 'code'])
    df = pd.concat([s1, s2])
    df = df.reset_index(drop=True)
    df.drop_duplicates(inplace=True)
    
    # 2 rows
    print(df)
    
    # int
    print(type(df.at[0, 'code']))
    # string
    print(type(df.at[1, 'code']))
    
    # Fix:
    df['code'] = df['code'].astype(str)
    df.drop_duplicates(inplace=True)
    
    # 1 row
    print(df)
    
    0 讨论(0)
  • 2020-12-09 21:09

    As mentioned in the comments, drop and drop_duplicates creates a new DataFrame, unless provided with an inplace argument. All these options would work:

    df = df.drop(dropRows)
    df = df.drop_duplicates('b') #this doesnt work either
    df.drop(dropRows, inplace = True)
    df.drop_duplicates('b', inplace = True)
    
    0 讨论(0)
提交回复
热议问题