Tagging a single word with the nltk pos tagger tags each letter instead of the word

I'm try to tag a single word with the nltk pos tagger:

word = "going"
pos = nltk.pos_tag(word)
print pos

But the output is this:

[('g', 'NN'), ('o', 'VBD'), ('i', 'PRP'), ('n', 'VBP'), ('g', 'JJ')]

It's tagging each letter rather than just the one word.

What can I do to make it tag the word?

nltk.tag.pos_tag accepts a list of tokens, separate and tags its elements. Therefore you need to put your words in an iterable like list:

>>> nltk.tag.pos_tag(['going'])
[('going', 'VBG')]

>>> word = 'going'
>>> word = nltk.word_tokenize(word)
>>> l1 = nltk.pos_tag(word)
>>> l1
[('going', 'VBG')]

The tagger works on a list of words. To turn the string into a list simply use something like

word_list = [word]

then use the pos tagger on the word_list. Note that if you have more than one word, you should run nltk.word_tokenize on the string first.

As for the success in tagging only one word, you should look into the lookup tagger mentioned in section 4.3 here. The pos_tag used by nltk is more complicated than just a one word lookup tagger, but it does use one as part of the process, so you should see ok results.

来源：https://stackoverflow.com/questions/29397708/tagging-a-single-word-with-the-nltk-pos-tagger-tags-each-letter-instead-of-the-w

标签

python

python-2.7

nlp

nltk

pos-tagger

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!