Removing duplicate rows from a csv file using a python script

后端 未结 6 1328
离开以前
离开以前 2020-12-02 10:06

Goal

I have downloaded a CSV file from hotmail, but it has a lot of duplicates in it. These duplicates are complete copies and I don\'t know why my

6条回答
  •  暖寄归人
    2020-12-02 10:43

    You can do using pandas library in jupyter notebook or relevant IDE, I m importing pandas to jupyter notebook and reading the csv file

    Then sort the values,accordingly by which parameters duplicates are present, since I have defined two attributes first it will sort by time, then by latitude

    Then remove duplicates as present in time column or column relevant as per you

    Then i store the duplicates removed and sorted file as gps_sorted

    import pandas as pd
    stock=pd.read_csv("C:/Users/Donuts/GPS Trajectory/go_track_trackspoints.csv")
    stock2=stock.sort_values(["time","latitude"],ascending=True)
    stock2.drop_duplicates(subset=['time'])
    stock2.to_csv("C:/Users/Donuts/gps_sorted.csv",)
    

    Hope this helps

提交回复
热议问题