问题
Given an input word, I want to determine whether it is a noun or not (in case of ambiguity, for instance cook
can be a noun or a verb, the word must be identified as a noun).
Actually I use the POS tagger from the Stanford Parser (i give it a single word as input, and i extract only the POS tag from the result). The results are quite good but it takes a very long time.
Is there a way (in python, please :) to perform this task quicker than what I do actually?
回答1:
If you simply want to check whether or not a single word can be used as a noun, the quickest way might be to build a set of all nouns and then just check the word for membership of that set.
For a list of all nouns you could use the WordNet corpus (which can be accessed through NLTK for example):
>>> from nltk.corpus import wordnet as wn
>>> nouns = {x.name().split('.', 1)[0] for x in wn.all_synsets('n')}
>>> "cook" in nouns
True
>>> "and" in nouns
False
回答2:
I can't speak for the Python wrapper, but if you use the Stanford POS tagger
rather than the parser, it should be much quicker. There are wrappers for Stanford CoreNLP
, which includes the tagger: https://pypi.python.org/pypi/corenlp-python; or, it looks like nltk
has a Stanford tagger module too http://www.nltk.org/_modules/nltk/tag/stanford.html .
You may also get better results if you embed the single word in a toy sentence. Something like "The X is a thing." Depending on the sentence, this can bias you towards or away from guessing words as nouns too.
回答3:
I would second the use of Wordnet if you are checking single words. I also used the freely available TreeTagger: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ The binary runs really fast and has support for multiple languages. If you need a pure Pythonic solution, check the NLTK implementation of the Brill Tagger: http://www.nltk.org/_modules/nltk/tag/brill.html
来源:https://stackoverflow.com/questions/28033882/determining-whether-a-word-is-a-noun-or-not