Determining whether a word is a noun or not

淺唱寂寞╮ 提交于 2019-12-09 04:57:12

问题


Given an input word, I want to determine whether it is a noun or not (in case of ambiguity, for instance cook can be a noun or a verb, the word must be identified as a noun).

Actually I use the POS tagger from the Stanford Parser (i give it a single word as input, and i extract only the POS tag from the result). The results are quite good but it takes a very long time.

Is there a way (in python, please :) to perform this task quicker than what I do actually?


回答1:


If you simply want to check whether or not a single word can be used as a noun, the quickest way might be to build a set of all nouns and then just check the word for membership of that set.

For a list of all nouns you could use the WordNet corpus (which can be accessed through NLTK for example):

>>> from nltk.corpus import wordnet as wn
>>> nouns = {x.name().split('.', 1)[0] for x in wn.all_synsets('n')}
>>> "cook" in nouns
True
>>> "and" in nouns
False



回答2:


I can't speak for the Python wrapper, but if you use the Stanford POS tagger rather than the parser, it should be much quicker. There are wrappers for Stanford CoreNLP, which includes the tagger: https://pypi.python.org/pypi/corenlp-python; or, it looks like nltk has a Stanford tagger module too http://www.nltk.org/_modules/nltk/tag/stanford.html .

You may also get better results if you embed the single word in a toy sentence. Something like "The X is a thing." Depending on the sentence, this can bias you towards or away from guessing words as nouns too.




回答3:


I would second the use of Wordnet if you are checking single words. I also used the freely available TreeTagger: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ The binary runs really fast and has support for multiple languages. If you need a pure Pythonic solution, check the NLTK implementation of the Brill Tagger: http://www.nltk.org/_modules/nltk/tag/brill.html



来源:https://stackoverflow.com/questions/28033882/determining-whether-a-word-is-a-noun-or-not

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!