how could I make a search match for similar words

浪尽此生 提交于 2019-12-21 07:39:32

问题


I'm working trying to automatically categorize short articles and I'm trying to figure out how to match similar words - eg, shelf shelves or painting and repaint

I'm using the Porter stemming algorithm but it only helps for certain situations and only with the end of the word (both examples above don't work with it).

Is there an algorithm or related word lists that would help with something like this (outside of making my own?)

(I'm working in php so any solutions in that language would be more helpful.)


回答1:


The Levenshtein Distance is what you are looking for.

For any two strings, it calculates the minimum number of insertions, mutations and deletions that need to occur to changes one string to the other.

If the distance is low then the two words are similar.

You could also use the Soundex algorithm to determine if two words sound similar.

See also:
PHP levenshtein function
PHP soundex function




回答2:


Well, there is the mother of all "related word lists", called WordNet: http://wordnet.princeton.edu/

It's available free of charge subject to a fairly generous license. There is a PHP interface in the "related projects" section.

The advantage of this over using a word similarity algorithm is that it even knows dissimilar synonyms of words such as "paint" and "colour". The downside is that you either have to know the right synsets (after all, one word can mean different things) or you can get a pretty wild list of synonyms.



来源:https://stackoverflow.com/questions/4064042/how-could-i-make-a-search-match-for-similar-words

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!