Determining whether a word is a noun or not

问题

Given an input word, I want to determine whether it is a noun or not (in case of ambiguity, for instance cook can be a noun or a verb, the word must be identified as a noun).

Actually I use the POS tagger from the Stanford Parser (i give it a single word as input, and i extract only the POS tag from the result). The results are quite good but it takes a very long time.

Is there a way (in python, please :) to perform this task quicker than what I do actually?

回答1:

If you simply want to check whether or not a single word can be used as a noun, the quickest way might be to build a set of all nouns and then just check the word for membership of that set.

For a list of all nouns you could use the WordNet corpus (which can be accessed through NLTK for example):

>>> from nltk.corpus import wordnet as wn
>>> nouns = {x.name().split('.', 1)[0] for x in wn.all_synsets('n')}
>>> "cook" in nouns
True
>>> "and" in nouns
False

回答2:

I can't speak for the Python wrapper, but if you use the Stanford POS tagger rather than the parser, it should be much quicker. There are wrappers for Stanford CoreNLP, which includes the tagger: https://pypi.python.org/pypi/corenlp-python; or, it looks like nltk has a Stanford tagger module too http://www.nltk.org/_modules/nltk/tag/stanford.html .

You may also get better results if you embed the single word in a toy sentence. Something like "The X is a thing." Depending on the sentence, this can bias you towards or away from guessing words as nouns too.

回答3:

I would second the use of Wordnet if you are checking single words. I also used the freely available TreeTagger: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ The binary runs really fast and has support for multiple languages. If you need a pure Pythonic solution, check the NLTK implementation of the Brill Tagger: http://www.nltk.org/_modules/nltk/tag/brill.html

来源：https://stackoverflow.com/questions/28033882/determining-whether-a-word-is-a-noun-or-not

标签

python

nlp

stanford-nlp