php : word proximity script?

独自空忆成欢 提交于 2019-12-02 09:50:25

I also thought of Hamming distance as Felix Kling commented. Maybe you can make some variant, where you encode your words into specific codewords and then check their distances through an array that holds your codewords.

So if you have array[11, 02, 85, 37, 11], you can easily find that 11 has a maximum distance of 4 in this array.

Don't know if this would work for you, but i think i would do it in a similar manner.

If you are speaking about specific word comparisons, you will want to look at the SOUNDEX function of MySQL. (I will assume you may be using mysql). When comparing two words, you can get a reference to how they sound:

SELECT `word` FROM `list_of_words` WHERE SOUNDEX(`word`) = SOUNDEX('{TEST_WORD}');

Then when you get your list of words (as most likely you will get quite a few), you cna check the distance between those words for the word that is CLOSEST (or the group of words depending on how you write your code).

$word = '{WORD TO CHECK}';
$distance = 4; // the smalled the distance the closed the word
foreach($word_results as $comparison_word) {
   $distance = levenshtein($comparison_word, $word);
   if($distance < $threshold) {
      $threshold = $distance;
      $similar_word = $comparison_word;
   }
}
echo $similar_word;

Hope that helps you find the direction you are looking for.

Happy coding!

your example searched Word1 ... Word2, should Word2 ... Word1 also be matched? A simple solution is to use RegEx:

i.e.:

  1. use regex: \bWord1\b(.*)\bWord2\b
  2. in the first match group, use space (or whatever boundary) to split it into an array, and count

this is the most straight forward method, but definitely not the best (i.e. performance wise) method. I think you need to clarify your needs if you want a more specific answer.

Update:

After the 2 questions are merged, I see other answers mentioning soundex, levinstein and hamming distance etc. I would suggest theclueless1 to CLARIFY the requirements so that people can give useful help. If this is an application related to searching or document clustering, I also suggest you to take a look at mature full text indexing/searching solutions such as sphinx or lucene. I think any of them can be used with PHP.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!