How to amend sub strings?

后端 未结 2 1235
北海茫月
北海茫月 2021-01-17 02:25

Using collation xxx_german2_ci which treats ü and ue as identical, is it possible to have all occurences of München be hi

2条回答
  •  天命终不由人
    2021-01-17 02:43

    In the end I decided to do it all in PHP, therefore my question about which characters are equal with utf8_general_ci.

    Below is what I came up with, by example: A label is constructed from a text $description, with sub strings $term highlighted, and special characters converted. Substitution is not complete, but probably sufficient for the actual use case.

    mb_internal_encoding("UTF-8");
    
    function withoutAccents($s) {
        return strtr(utf8_decode($s),
                     utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿß'),
                     'aaaaaceeeeiiiinooooouuuuyys');
    }
    
    function simplified($s) {
        return withoutAccents(strtolower($s));
    }
    
    function encodedSubstr($s, $start, $length) {
        return htmlspecialchars(mb_substr($s, $start, $length));
    }
    
    function labelFromDescription($description, $term) {
        $simpleTerm = simplified($term);
        $simpleDescription = simplified($description);
    
        $lastEndPos = $pos = 0;
        $termLen = strlen($simpleTerm);
        $label = ''; // HTML
        while (($pos = strpos($simpleDescription,
                              $simpleTerm, $lastEndPos)) !== false) {
            $label .=
                encodedSubstr($description, $lastEndPos, $pos - $lastEndPos).
                ''.
                encodedSubstr($description, $pos, $termLen).
                '';
            $lastEndPos = $pos + $termLen;
        }
        $label .= encodedSubstr($description, $lastEndPos,
                                strlen($description) - $lastEndPos);
    
        return $label;
    }
    
    echo labelFromDescription('São Paulo ', 'SAO')."\n";
    echo labelFromDescription('München ', 'ünc');
    

    Output:

    São Paulo <SAO>
    München <MUC>
    

提交回复
热议问题