问题
I wrote a mapper that prints out word pairs and a count of 1 for each of them.
import sys
from itertools import tee
for line in sys.stdin:
line = line.strip()
words = line.split()
def pairs(lst):
return zip(lst,lst[1:]+[lst[0]])
for i in pairs(words):
print i,1
I tried writing a reducer that creates a dictionary, but I am a bit stuck on how to sum them up.
import sys
mydict = dict()
for line in sys.stdin:
(word,cnt) = line.strip().split('\t') #\t
mydict[word] = mydict.get(word,0)+1
for word,cnt in mydict.items():
print word,cnt
But it says there are not enough arguments in the .split line, thoughts? Thank you.
回答1:
I think the problem is (word,cnt) = line.strip().split('\t') #\t
The split()
method returns a list, and tries to assign it to (word, cnt)
, which does not work because the number of items doesn't match (maybe there's sometimes only one word).
Maybe you want to use something like
for word in line.strip().split('\t'):
mydict[word] = mydict.get(word, 0) + 1
If you have problems with empty list elements, use list(filter(None, list_name))
to remove them.
Disclaimer: I didn't test the code. Also, this only refers to your second example
来源:https://stackoverflow.com/questions/26414623/combine-count-of-word-pairs-python