multibyte strtr() -> mb_strtr()

后端 未结 3 877
悲哀的现实
悲哀的现实 2020-12-08 16:55

Does anyone have written multibyte variant of function strtr() ? I need this one.

Edit 1 (example of desired usage):

Example:
$from = \'ľ         


        
相关标签:
3条回答
  • 2020-12-08 17:22
    function mb_strtr($str,$map,$enc){
    $out="";
    $strLn=mb_strlen($str,$enc);
    $maxKeyLn=1;
    foreach($map as $key=>$val){
        $keyLn=mb_strlen($key,$enc);
        if($keyLn>$maxKeyLn){
            $maxKeyLn=$keyLn;
        }
    }
    for($offset=0; $offset<$strLn; ){
        for($ln=$maxKeyLn; $ln>=1; $ln--){
            $cmp=mb_substr($str,$offset,$ln,$enc);
            if(isset($map[$cmp])){
                $out.=$map[$cmp];
                $offset+=$ln;
                continue 2;
            }
        }
        $out.=mb_substr($str,$offset,1,$enc);
        $offset++;
    }
    return $out;
    }
    
    0 讨论(0)
  • 2020-12-08 17:29

    I believe strtr is multi-byte safe, either way since str_replace is multi-byte safe you could wrap it:

    function mb_strtr($str, $from, $to)
    {
      return str_replace(mb_str_split($from), mb_str_split($to), $str);
    }
    

    Since there is no mb_str_split function you also need to write your own (using mb_substr and mb_strlen), or you could just use the PHP UTF-8 implementation (changed slightly):

    function mb_str_split($str) {
        return preg_split('~~u', $str, null, PREG_SPLIT_NO_EMPTY);;
    
    }
    

    However if you're looking for a function to remove all (latin?) accentuations from a string you might find the following function useful:

    function Unaccent($string)
    {
        return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8'));
    }
    
    echo Unaccent('ľľščťžýáíŕďňä'); // llsctzyairdna
    echo Unaccent('Iñtërnâtiônàlizætiøn'); // Internationalizaetion
    
    0 讨论(0)
  • 2020-12-08 17:34

    Probably using str_replace is a good solution. An alternative:

    <?php
    header('Content-Type: text/plain;charset=utf-8');
    
    function my_strtr($inputStr, $from, $to, $encoding = 'UTF-8') {
            $inputStrLength = mb_strlen($inputStr, $encoding);
    
            $translated = '';
    
            for($i = 0; $i < $inputStrLength; $i++) {
                    $currentChar = mb_substr($inputStr, $i, 1, $encoding);
    
                    $translatedCharPos = mb_strpos($from, $currentChar, 0, $encoding);
    
                    if($translatedCharPos === false) {
                            $translated .= $currentChar;
                    }
                    else {
                            $translated .= mb_substr($to, $translatedCharPos, 1, $encoding);
                    }
            }
    
            return $translated;
    }
    
    
    $from = 'ľľščťžýáíŕďňä'; // these chars are in UTF-8
    $to   = 'llsctzyairdna';
    
    // input - in UTF-8
    $str  = 'Kŕdeľ ďatľov učí koňa žrať kôru.';
    
    print 'Original: ';
    print chr(10);
    print $str;
    
    print chr(10);
    print chr(10);
    
    print 'Tranlated: ';
    print chr(10);
    print my_strtr( $str, $from, $to);
    

    Prints on my machine using PHP 5.2:

    Original: 
    Kŕdeľ ďatľov učí koňa žrať kôru.
    
    Tranlated: 
    Krdel datlov uci kona zrat kôru.
    
    0 讨论(0)
提交回复
热议问题