发表新帖

发表新帖

Removing duplicate rows from a csv file using a python script

后端未结

关注

 6  1339

离开以前 2020-12-02 10:06

Goal

I have downloaded a CSV file from hotmail, but it has a lot of duplicates in it. These duplicates are complete copies and I don\'t know why my

6条回答

旧时难觅i (楼主)

2020-12-02 10:40
You can use the following script:

pre-condition:
1. 1.csv is the file that consists the duplicates
2. 2.csv is the output file that will be devoid of the duplicates once this script is executed.
code
```
inFile = open('1.csv','r')

outFile = open('2.csv','w')

listLines = []

for line in inFile:

    if line in listLines:
        continue

    else:
        outFile.write(line)
        listLines.append(line)

outFile.close()

inFile.close()
```
Algorithm Explanation

Here, what I am doing is:
1. opening a file in the read mode. This is the file that has the duplicates.
2. Then in a loop that runs till the file is over, we check if the line has already encountered.
3. If it has been encountered than we don't write it to the output file.
4. If not we will write it to the output file and add it to the list of records that have been encountered already
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题