How to find all the related keywords for a root word?

吃可爱长大的小学妹 提交于 2021-02-11 18:04:52

问题


I am trying to figure out a way to find all the keywords that come from the same root word (in some sense the opposite action of stemming). Currently, I am using R for coding, but I am open to switching to a different language if it helps.

For instance, I have the root word "rent" and I would like to be able to find "renting", "renter", "rental", "rents" and so on.


回答1:


Try this code in python:

from pattern.en import lexeme
print(lexeme("rent")

the output generated is:
enter image description here
Installation:
pip install pattern
pip install nltk
Now, open a terminal, type python and run the below code.

import nltk
nltk.download(["wordnet","wordnet_ic","sentiwordnet"])

After the installation is done, run the pattern code again.




回答2:


You want to find the opposite of Stemming, but stemming can be your way in.

Look at this example in Python:

from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()
words = ["renting", "renter", "rental", "rents", "apple"]
all_rents = {}
for word in words:
    stem = stemmer.stem(word)
    if stem not in all_rents:
        all_rents[stem] = []
        all_rents[stem].append(word)
    else:
        all_rents[stem].append(word)
print(all_rents)

Result:

{'rent': ['renting', 'rents'], 'renter': ['renter'], 'rental': ['rental'], 'appl': ['apple']}

There are several other algorithm to use. However, keep in mind that stemmers are rule-based and are not "smart" to the point where they will select all related words (as seen above). You can even implement your own rules (extend the Stem API from NLTK).

Read more about all available stemmers in NLTK (the module that was used in the above example) here: https://www.nltk.org/api/nltk.stem.html

You can implement your own algorithm as well. For example, you can implement Levenshtein Distance (as proposed in @noski comment) to compute the smaller common prefix. However, you have to do your own research on this one, since it is a complex process.




回答3:


For an R answer, you can try these functions as a starting point. d.b gives grepl as an example, here are a few more:

words =  c("renting", "renter", "rental", "rents", "apple", "brent")
grepl("rent", words) # TRUE TRUE TRUE TRUE FALSE TRUE
startsWith(words, "rent") # TRUE TRUE TRUE TRUE FALSE FALSE
endsWith(words, "rent") # FALSE FALSE FALSE FALSE FALSE TRUE


来源:https://stackoverflow.com/questions/58066049/how-to-find-all-the-related-keywords-for-a-root-word

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!