问题
I am parsing through a CSV file and require your kind assistance. I have duplicates in my CSV file. I want to tell Python to provide me with the total number of Duplicate Addresses and total number of unique Addresses and then list them. I have successfully got to the part where the Address shows if it's an unique or duplicate but now I want to tell Python to provide me with the respected numbers as well.
import csv
csv_data = csv.reader(file('T:\DataDump\Book1.csv'))
next(csv_data)
already_seen = set()
for row in csv_data:
Address = row[6]
if Address in already_seen:
print('{} is a duplicate Address'.format(Address))
else:
print('{} is a unique Address'.format(Address))
already_seen.add(Address)
回答1:
You could detect duplicates on the fly with 1 sole pass but you have to fully read the file to know if it's not a duplicate and to count how many duplicates there are.
So 2 passes are required here. Use collections.Counter like this:
import csv
import collections
with open(r"T:\DataDump\Book1.csv") as f:
csv_data = csv.reader(f,delimiter=",")
next(csv_data) # skip title line
count = collections.Counter()
# first pass: read the file
for row in csv_data:
address = row[6]
count[address] += 1
# second pass: display duplicate info & compute total
total_dups = 0
for address,nb in count.items():
if nb>1:
total_dups += nb
print('{} is a duplicate address, seen {} times'.format(address,nb))
else:
print('{} is a unique address'.format(address))
print("Total duplicate addresses {}".format(toal_dups))
to print the total number of duplicate addresses you could also do directly:
print("Total duplicate addresses {}".format(sum(x for x in count.values() if x > 1)))
回答2:
Use this:
my_dict = { i:My_List.count(i) for i in My_List}
It will return the count of every instances including duplicates
回答3:
This should be as easy as having a dictionary to store the count of addresses:
import csv
csv_data = csv.reader(file('T:\DataDump\Book1.csv'))
next(csv_data)
address_count = {}
for row in csv_data:
Address = row[6]
if Address in address_count.keys():
print('{} is a duplicate Address'.format(Address))
address_count[Address] = address_count[Address] + 1
else:
print('{} is a unique Address'.format(Address))
address_count[Address] = 1
print address_count
来源:https://stackoverflow.com/questions/40386356/finding-total-number-of-duplicates-in-csv-file