General synonym and part of speech processing using nltk

血红的双手。 提交于 2019-12-05 04:40:28

Apparently nltk allows for the retrieval of all synsets associated with a word. Granted, there are usually a number of them reflecting different word senses. In order to functionally find synonyms (or if two words are synonyms) you must attempt to match the closest synonym set possible, which is possible through any of the similarity metrics mentioned above. I crafted up some basic code to do this, as shown below, how to find if two words are synonyms:

from nltk.corpus import wordnet
from nltk.stem.wordnet import WordNetLemmatizer
import itertools


def Synonym_Checker(word1, word2):
    """Checks if word1 and word2 and synonyms. Returns True if they are, otherwise False"""
    equivalence = WordNetLemmatizer()
    word1 = equivalence.lemmatize(word1)
    word2 = equivalence.lemmatize(word2)

    word1_synonyms = wordnet.synsets(word1)
    word2_synonyms = wordnet.synsets(word2)

    scores = [i.wup_similarity(j) for i, j in list(itertools.product(word1_synonyms, word2_synonyms))]
    max_index = scores.index(max(scores))
    best_match = (max_index/len(word1_synonyms), max_index % len(word1_synonyms)-1)

    word1_set = word1_synonyms[best_match[0]].lemma_names
    word2_set = word2_synonyms[best_match[1]].lemma_names
    match = False
    match = [match or word in word2_set for word in word1_set][0]

    return match

print Synonym_Checker("tomato", "Lycopersicon_esculentum")

I may try to implement progressively stronger stemming algorithms, but for the first few tests I did, this code actually worked for every word I could find. If anyone has ideas on how to improve this algorithm, or has anything to improve this answer in any way I would love to hear it.

Can you wrap your Word_Set = wordnet.synset(Call) with a try: and ignore the WordNetError exception? Looks like the error you have is that some words are not categorized correctly, but this exception would also occur for unrecognized words, so catching the exception just seems like a good idea to me.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!