问题
I am using PHP's similar_text() call to compare two strings, however, I am not getting good enough results, for example, the best I'm getting is 80.95% for a match that I'd like to see 100% on.
What other functions can I use to get the strings down to the core?
<!-- Overcast, Rain or Showers compared Overcast, Rain or Showers is 80.9523809524 -->
<!-- Overcast, Risk of Rain or Showers compared Overcast, Rain or Showers is 86.2068965517 -->
<!-- Overcast, Chance of Rain or Showers compared Overcast, Rain or Showers is 83.3333333333 -->
回答1:
Levenshtein distance: http://php.net/manual/en/function.levenshtein.php
It's reverse to similar_text(), so 0% means there is no difference.
// <!-- Overcast, Rain or Showers compared Overcast, Rain or Showers is 0 -->
// <!-- Overcast, Risk of Rain or Showers compared Overcast, Rain or Showers is 11 -->
// <!-- Overcast, Chance of Rain or Showers compared Overcast, Rain or Showers is 13 -->
回答2:
The Levenshtein distance is a good way to compare strings. It's faster than similar_text()
, and it lets you control its output by weighting the different parts of the algorithm.
To turn Levenshtein distance into a useable "match" percentage, you can express it as a fraction of the average lengths of the source strings:
// Assume $src1 and $src2 are your source strings and at least one is non-empty
$avgLength = ( strlen( $src1 ) + strlen( $src2 ) ) / 2;
$matchFraction = 1 - ( levenshtein( $src1, $src2 ) / $avgLength );
//$matchFraction is now between 0 and 1, with 1 being equal strings and 0 being totally different
来源:https://stackoverflow.com/questions/10690854/how-to-improve-php-string-match-with-similar-text