Combine count of word pairs: python

问题

I wrote a mapper that prints out word pairs and a count of 1 for each of them.

import sys
from itertools import tee


for line in sys.stdin:
    line = line.strip()
    words = line.split()

def pairs(lst):
    return zip(lst,lst[1:]+[lst[0]])

for i in pairs(words):
    print i,1

I tried writing a reducer that creates a dictionary, but I am a bit stuck on how to sum them up.

import sys

mydict = dict()
for line in sys.stdin:
    (word,cnt) = line.strip().split('\t') #\t
    mydict[word] = mydict.get(word,0)+1

for word,cnt in mydict.items():
    print word,cnt

But it says there are not enough arguments in the .split line, thoughts? Thank you.

回答1:

I think the problem is (word,cnt) = line.strip().split('\t') #\t
The split() method returns a list, and tries to assign it to (word, cnt), which does not work because the number of items doesn't match (maybe there's sometimes only one word).
Maybe you want to use something like

for word in line.strip().split('\t'):
    mydict[word] = mydict.get(word, 0) + 1

If you have problems with empty list elements, use list(filter(None, list_name)) to remove them.

Disclaimer: I didn't test the code. Also, this only refers to your second example

来源：https://stackoverflow.com/questions/26414623/combine-count-of-word-pairs-python

标签

python

MapReduce

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!