Removing duplicate rows from a csv file using a python script

后端 未结 6 1339
离开以前
离开以前 2020-12-02 10:06

Goal

I have downloaded a CSV file from hotmail, but it has a lot of duplicates in it. These duplicates are complete copies and I don\'t know why my

6条回答
  •  旧时难觅i
    2020-12-02 10:40

    You can use the following script:

    pre-condition:

    1. 1.csv is the file that consists the duplicates
    2. 2.csv is the output file that will be devoid of the duplicates once this script is executed.

    code

    
    
    inFile = open('1.csv','r')
    
    outFile = open('2.csv','w')
    
    listLines = []
    
    for line in inFile:
    
        if line in listLines:
            continue
    
        else:
            outFile.write(line)
            listLines.append(line)
    
    outFile.close()
    
    inFile.close()
    

    Algorithm Explanation

    Here, what I am doing is:

    1. opening a file in the read mode. This is the file that has the duplicates.
    2. Then in a loop that runs till the file is over, we check if the line has already encountered.
    3. If it has been encountered than we don't write it to the output file.
    4. If not we will write it to the output file and add it to the list of records that have been encountered already

提交回复
热议问题