NLTK accuracy: “ValueError: too many values to unpack”

こ雲淡風輕ζ 提交于 2019-12-01 12:03:26

问题


I'm trying to do some sentiment analysis of a new movie from Twitter using the NLTK toolkit. I've followed the NLTK 'movie_reviews' example and I've built my own CategorizedPlaintextCorpusReader object. The problem arises when I call nltk.classify.util.accuracy(classifier, testfeats). Here is the code:

import os
import glob
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
        return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

trainfeats = negfeats + posfeats

# Building a custom Corpus Reader
tweets = nltk.corpus.reader.CategorizedPlaintextCorpusReader('./tweets', r'.*\.txt', cat_pattern=r'(.*)\.txt')
tweetsids = tweets.fileids()
testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]

print 'Training the classifier'
classifier = NaiveBayesClassifier.train(trainfeats)

for tweet in tweetsids:
        print tweet + ' : ' + classifier.classify(word_feats(tweets.words(tweetsids)))

classifier.show_most_informative_features()

print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)

It all seems to work fine until it gets to the last line. That's when I get the error:

>>> nltk.classify.util.accuracy(classifier, testfeats)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
    results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack

Does anybody see anything wrong within the code?

Thanks.


回答1:


The error message

File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
  results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack

arises because items in gold can not be unpacked into a 2-tuple, (fs,l):

[fs for (fs,l) in gold]  # <-- The ValueError is raised here

It is the same error you would get if gold equals [(1,2,3)], since the 3-tuple (1,2,3) can not be unpacked into a 2-tuple (fs,l):

In [74]: [fs for (fs,l) in [(1,2)]]
Out[74]: [1]
In [73]: [fs for (fs,l) in [(1,2,3)]]
ValueError: too many values to unpack

gold might be buried inside the implementation of nltk.classify.util.accuracy, but this hints that your inputs, classifier or testfeats are of the wrong "shape".

There is no problem with classifer, since calling accuracy(classifier, trainfeats) works:

In [61]: print 'accuracy:', nltk.classify.util.accuracy(classifier, trainfeats)
accuracy: 0.9675

The problem must be in testfeats.


Compare trainfeats with testfeats. trainfeats[0] is a 2-tuple containing a dict and a classification:

In [63]: trainfeats[0]
Out[63]: 
({u'!': True,
  u'"': True,
  u'&': True,
  ...
  u'years': True,
  u'you': True,
  u'your': True},
 'neg')           # <---  Notice the classification, 'neg'

but testfeats[0] is just a dict, word_feats(tweets.words(fileids=[f])):

testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]

So to fix this you would need to define testfeats to look more like trainfeats -- each dict returned by word_feats must be paired with a classification.



来源:https://stackoverflow.com/questions/31920199/nltk-accuracy-valueerror-too-many-values-to-unpack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!