How to amend sub strings?

后端未结

关注

 2  1235

北海茫月 2021-01-17 02:25

Using collation xxx_german2_ci which treats ü and ue as identical, is it possible to have all occurences of München be hi

2条回答

天命终不由人 (楼主)

2021-01-17 02:43

In the end I decided to do it all in PHP, therefore my question about which characters are equal with utf8_general_ci.

Below is what I came up with, by example: A label is constructed from a text $description, with sub strings $term highlighted, and special characters converted. Substitution is not complete, but probably sufficient for the actual use case.

mb_internal_encoding("UTF-8");

function withoutAccents($s) {
    return strtr(utf8_decode($s),
                 utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿß'),
                 'aaaaaceeeeiiiinooooouuuuyys');
}

function simplified($s) {
    return withoutAccents(strtolower($s));
}

function encodedSubstr($s, $start, $length) {
    return htmlspecialchars(mb_substr($s, $start, $length));
}

function labelFromDescription($description, $term) {
    $simpleTerm = simplified($term);
    $simpleDescription = simplified($description);

    $lastEndPos = $pos = 0;
    $termLen = strlen($simpleTerm);
    $label = ''; // HTML
    while (($pos = strpos($simpleDescription,
                          $simpleTerm, $lastEndPos)) !== false) {
        $label .=
            encodedSubstr($description, $lastEndPos, $pos - $lastEndPos).
            ''.
            encodedSubstr($description, $pos, $termLen).
            '';
        $lastEndPos = $pos + $termLen;
    }
    $label .= encodedSubstr($description, $lastEndPos,
                            strlen($description) - $lastEndPos);

    return $label;
}

echo labelFromDescription('São Paulo ', 'SAO')."\n";
echo labelFromDescription('München ', 'ünc');

Output:

São Paulo <SAO>
München <MUC>

0 讨论(0)

查看其它2个回答