How to reverse a Unicode string

后端 未结 6 1412
你的背包
你的背包 2020-12-06 17:04

It was hinted in a comment to an answer to this question that PHP can not reverse Unicode strings.

As for Unicode, it works in PHP because most app

6条回答
  •  萌比男神i
    2020-12-06 17:48

    Grapheme functions handle UTF-8 string more correctly than mbstring and PCRE functions/ Mbstring and PCRE may break characters. You can see the defference between them by executing the following code.

    function str_to_array($string)
    {
        $length = grapheme_strlen($string);
        $ret = [];
    
        for ($i = 0; $i < $length; $i += 1) {
    
            $ret[] = grapheme_substr($string, $i, 1);
        }
    
        return $ret;
    }
    
    function str_to_array2($string)
    {
        $length = mb_strlen($string, "UTF-8");
        $ret = [];
    
        for ($i = 0; $i < $length; $i += 1) {
    
        $ret[] = mb_substr($string, $i, 1, "UTF-8");
    }
    
        return $ret;
    }
    
    function str_to_array3($string)
    {
        return preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
    }
    
    function utf8_strrev($string)
    {
        return implode(array_reverse(str_to_array($string)));
    }
    
    function utf8_strrev2($string)
    {
        return implode(array_reverse(str_to_array2($string)));
    }
    
    function utf8_strrev3($string)
    {
        return implode(array_reverse(str_to_array3($string)));
    }
    
    // http://www.php.net/manual/en/function.grapheme-strlen.php
    $string = "a\xCC\x8A"  // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
             ."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS'  (U+00F6)
    
    var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },
    [
      'should be' => "o\xCC\x88"."a\xCC\x8A",
      'grapheme' => utf8_strrev($string),
      'mbstring' => utf8_strrev2($string),
      'pcre' => utf8_strrev3($string)
    ]));
    

    The result is here.

    array(4) {
      ["should be"]=>
      string(12) "6FCC8861CC8A"
      ["grapheme"]=>
      string(12) "6FCC8861CC8A"
      ["mbstring"]=>
      string(12) "CC886FCC8A61"
      ["pcre"]=>
      string(12) "CC886FCC8A61"
    }
    

    IntlBreakIterator can be used since PHP 5.5 (intl 3.0);

    function utf8_strrev($str)
    {
        $it = IntlBreakIterator::createCodePointInstance();
        $it->setText($str);
    
        $ret = '';
        $pos = 0;
        $prev = 0;
    
        foreach ($it as $pos) {
            $ret = substr($str, $prev, $pos - $prev) . $ret;
            $prev = $pos;
        }
    
        return $ret;  
    }
    

提交回复
热议问题