How to check frequency in csv file on python?

让人想犯罪 __ 提交于 2021-02-10 05:27:43

问题


I have few doc in .csv - 20 000 record or more.

Basically it's easy - something like that:

numer,produkt,date
202,produkt A its sad,20.04.2019
203,produkt A its sad,21.04.2019
204,produkt A its sad,22.04.2019
etc

I want to print info:

A "produkt A its sad" appears 6 times A "produkt B" appers 3 times A "produkt C" appers 2 times

Base on another answer on stack overflow I wrote:

import csv
from collections import Counter

with open ('base2.csv', encoding="utf8") as csv_file:

    csv_reader = csv.reader(csv_file)

    produkt = [row[0] for row in csv_file]

    for (k,v) in Counter(produkt).items():
        print ("A %s appears %d times" % (k, v))

I'm newbie on python so its probably something stupid :)

output is:

A n appears 1 times
A 2 appears 11 times

回答1:


Your issue is when you u se a list comprehension to build the list of products, you are reading from the file not the CSV reader object.

produkt = [row[0] for row in csv_file]

Says read each line of the file and store the line one at a time in variable name row, and from row, take the first char (index 0) from the string that row holds.

Instead assuming you want the produkt which is field one you should update this line to be

produkt = [row[1] for row in csv_reader]

Although that would also read the header line, Since you have headers i would use dictReader and select the column name your interested in like:

csv_reader = csv.DictReader(csv_data)
produkts = [row['produkt'] for row in csv_reader]
for (k, v) in Counter(produkts).items():
    print("A %s appears %d times" % (k, v))

That way its clear what column your counting without havint to just use numeric index




回答2:


In your produkt = [row[0] for row in csv_file] the variable row is of string type and row[0] is just the 0-th character. I've replaced it with row.split(",")[1] and got the intended answer.




回答3:


Im reading from the csv_file instead of the csv_reader.

So produkt = [row[0] for row in csv_file] essentialy says read each line from the file and store as row, then take the first char of that line.

I replace csv_file to csv_reader and its works.

Thanks to @chrisdoyle




回答4:


You need to use the csv_reader object and not the csv_file.

import csv
from collections import Counter

with open ("base2.csv", encoding="utf8") as csv_file:

csv_reader = csv.reader(csv_file, delimiter=',')

frequency = Counter([row[1] for row in csv_reader])
#In the above line, you have typed csv_file rather it should 
# be csv_reader
for k, v in frequency.items():
    print("{} appears {} times".format(k, v))


来源:https://stackoverflow.com/questions/61343769/how-to-check-frequency-in-csv-file-on-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!