How to sort an array by similarity in relation to an inputted word.

前端未结

关注

 5  934

闹比i 2020-12-05 11:21

I have on PHP array, for example:

$arr = array(\"hello\", \"try\", \"hel\", \"hey hello\");

Now I want to do rearrange of the array which w

5条回答

感动是毒 (楼主)

2020-12-05 11:39

While @yceruto's answer is correct and informative, I would like to extend additional insights and demonstrate more modern implementation syntax.

The three-way comparison operator (aka "spaceship operator") <=> from PHP7+
Arrow function syntax to allow extra variables into the custom function scope from PHP7.4+.

First about the generated scores from respective functions...

levenshtein() and similar_text() ARE case-sensitive so an uppercase H is just as much a mismatch as the number 6 when compared to h.
levenshtein() and similar_text() ARE NOT multi-byte aware so an accented character like ê will not only be deemed a mismatch for e, it will potentially receive a heavier penalty based on each individual byte being a mismatch.

If you want to make case-insensitive comparisons, you can simply convert both strings to uppercase/lowercase before executing.

If your application requires multi-byte support, you should search for existing repositories that provide this functionality.

Additional techniques for those willing to research more deeply include metaphone() and soundex(), but I will not delve into these topics in this answer.

Scores:

Test vs "hello" |  levenshtein   |  similar_text  |   similar_text's percent   |
----------------+----------------+----------------+----------------------------|
H3||0           |       5        |      0         |       0                    |
Hallo           |       2        |      3         |      60                    |
aloha           |       5        |      2         |      40                    |
h               |       4        |      1         |      33.333333333333       |
hallo           |       1        |      4         |      80                    |
hallå           |       3        |      3         |      54.545454545455       |
hel             |       2        |      3         |      75                    |
helicopter      |       6        |      4         |      53.333333333333       |
hellacious      |       5        |      5         |      66.666666666667       |
hello           |       0        |      5         |     100                    |
hello y'all     |       6        |      5         |      62.5                  |
hello yall      |       5        |      5         |      66.666666666667       |
helów           |       3        |      3         |      54.545454545455       |
hey hello       |       4        |      5         |      71.428571428571       |
hola            |       3        |      2         |      44.444444444444       |
hêllo           |       2        |      4         |      72.727272727273       |
mellow yellow   |       9        |      4         |      44.444444444444       |
try             |       5        |      0         |       0                    |

Sort by levenshtein() PHP7+ (Demo)

usort($testStrings, function($a, $b) use ($needle) {
    return levenshtein($needle, $a) <=> levenshtein($needle, $b);
});

Sort by levenshtein() PHP7.4+ (Demo)

usort($testStrings, fn($a, $b) => levenshtein($needle, $a) <=> levenshtein($needle, $b));

Notice that $a and $b have changed sides of the <=> evaluation for DESC ordering. **Notice that hello is not assured to be positioned as first element

Sort by similar_text() PHP7+ (Demo)

usort($testStrings, function($a, $b) use ($needle) {
    return similar_text($needle, $b) <=> similar_text($needle, $a);
});

Sort by similar_text() PHP7.4+ (Demo)

usort($testStrings, fn($a, $b) => similar_text($needle, $b) <=> similar_text($needle, $a));

Notice the difference in scoring of hallå and helicopter via similar_text()'s return value versus similar_text()'s percent value.

Sort by similar_text()'s percent PHP7+ (Demo)

usort($testStrings, function($a, $b) use ($needle) {
    similar_text($needle, $a, $percentA);
    similar_text($needle, $b, $percentB);
    return $percentB <=> $percentA;
});

Sort by similar_text()'s percent PHP7.4+ (Demo)

usort($testStrings, fn($a, $b) => 
    [is_int(similar_text($needle, $b, $percentB)), $percentB]
    <=>
    [is_int(similar_text($needle, $a, $percentA)), $percentA]
);

Notice that I am neutralizing the unwanted return value of similar_text() by converting its return value to true, then using the generated percent value -- this allows the generation of the percent value without returning too soon since arrow function syntax does not permit multi-line execution.

Sort by levenshtein() then break ties with similar_text() PHP7+ (Demo)

usort($testStrings, function($a, $b) use ($needle) {
    return [levenshtein($needle, $a), similar_text($needle, $b)]
           <=>
           [levenshtein($needle, $b), similar_text($needle, $a)];
});

Sort by levenshtein() then break ties with similar_text()'s percent PHP7.4+ (Demo)

usort($testStrings, fn($a, $b) =>
    [levenshtein($needle, $a), similar_text($needle, $b)]
    <=>
    [levenshtein($needle, $b), similar_text($needle, $a)]
);

Personally, I never use anything but levenshtein() in my projects because it consistently delivers the results that I'm looking for.

0 讨论(0)

查看其它5个回答