Count the frequency of words from a column in Python

吃可爱长大的小学妹 提交于 2019-12-13 16:17:57

问题


I have a csv file. The structure of the csv file is:

Name Hour Location
A    4    San Fransisco
B    2    New York
C    4    New York
D    7    Denton
E    8    Boston
F    1    Boston

If you observe the data above, There are

2 New York and
2 Boston

I tried to use the tabular package. I tried the tutorials mentioned in the tabular package documentation since more than 7 hours. But I dint get through.

Can anyone help me, how can I extract the count of the frequent words in that Csv file in the Location column using Python.

Thank you.


回答1:


data = """Name\tHour\tLocation
A\t4\tSan Fransisco
B\t2\tNew York
C\t4\tNew York
D\t7\tDenton
E\t8\tBoston
F\t1\tBoston
"""

import csv
import StringIO
from collections import Counter


input_stream = StringIO.StringIO(data)
reader = csv.reader(input_stream, delimiter='\t')

reader.next() #skip header
cities = [row[2] for row in reader]

for (k,v) in Counter(cities).iteritems():
    print "%s appears %d times" % (k, v)

Output:

San Fransisco appears 1 times
Denton appears 1 times
New York appears 2 times
Boston appears 2 times



回答2:


Not sure what you are separating by but the example shows up as 4 spaces so this is a solution for that.

If you actually are separating by tabs use the answer by @MariaZverina

import collections

with open('test.txt') as f:
    next(f) # Skip the first line
    print collections.Counter(line.rstrip().rpartition('    ')[-1] for line in f)

Output:

Counter({'New York': 2, 'Boston': 2, 'San Fransisco': 1, 'Denton': 1})



回答3:


If the file isn't too large, the most naive way would be:

  • Read the file line by line
  • Append the values for location to a list
  • Build a set of uniques from that list
  • Determine the count for each of the uniques in the list


来源:https://stackoverflow.com/questions/11396665/count-the-frequency-of-words-from-a-column-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!