How to find out wether a word exists in english using nltk

自作多情 提交于 2019-12-18 17:04:58

问题


I am looking for a proper solution to this question. This question has been asked many times before and i didnt find a single answer that suited. I need to use a corpus in NLTK to detect whether a word is an english word

I have tried to do :

wordnet.synsets(word)

This doesnt word for many common words. Using a list of words in english and performing lookup in a file is not an option. Using enchant is not an option either. If there is another library that can do the same, please provide the usage of the api. If not, please provide a corpus in nltk which has all the words in english.


回答1:


NLTK includes some corpora that are nothing more than wordlists. The Words Corpus is the /usr/share/dict/words file from Unix, used by some spell checkers. We can use it to find unusual or mis-spelt words in a text corpus, as shown in :

def unusual_words(text):
    text_vocab = set(w.lower() for w in text.split() if w.isalpha())
    english_vocab = set(w.lower() for w in nltk.corpus.words.words())
    unusual = text_vocab - english_vocab
    return sorted(unusual)

And in this case you can check the member ship of your word with english_vocab.

>>> import nltk
>>> english_vocab = set(w.lower() for w in nltk.corpus.words.words())
>>> 'a' in english_vocab
True
>>> 'this' in english_vocab
True
>>> 'nothing' in english_vocab
True
>>> 'nothingg' in english_vocab
False
>>> 'corpus' in english_vocab
True
>>> 'Terminology'.lower() in english_vocab
True
>>> 'sorted' in english_vocab
True



回答2:


I tried the above approach but for many words which should exist so I tried wordnet. I think this have more comprehensive vacabulary.-

from nltk.corpus import wordnet if wordnet.synsets(word): #Do something else: #Do some otherthing



来源:https://stackoverflow.com/questions/29099621/how-to-find-out-wether-a-word-exists-in-english-using-nltk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!