What is the best way to split a string into an array of Unicode characters in PHP?

前端 未结 7 2450
野的像风
野的像风 2020-12-05 15:23

In PHP, what is the best way to split a string into an array of Unicode characters? If the input is not necessarily UTF-8?

I want to know whether the set of Unicode

7条回答
  •  不思量自难忘°
    2020-12-05 15:41

    You could use the 'u' modifier with PCRE regex ; see Pattern Modifiers (quoting) :

    u (PCRE8)

    This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

    For instance, considering this code :

    header('Content-type: text/html; charset=UTF-8');  // So the browser doesn't make our lives harder
    $str = "abc 文字化け, efg";
    
    $results = array();
    preg_match_all('/./', $str, $results);
    var_dump($results[0]);
    

    You'll get an unusable result:

    array
      0 => string 'a' (length=1)
      1 => string 'b' (length=1)
      2 => string 'c' (length=1)
      3 => string ' ' (length=1)
      4 => string '�' (length=1)
      5 => string '�' (length=1)
      6 => string '�' (length=1)
      7 => string '�' (length=1)
      8 => string '�' (length=1)
      9 => string '�' (length=1)
      10 => string '�' (length=1)
      11 => string '�' (length=1)
      12 => string '�' (length=1)
      13 => string '�' (length=1)
      14 => string '�' (length=1)
      15 => string '�' (length=1)
      16 => string ',' (length=1)
      17 => string ' ' (length=1)
      18 => string 'e' (length=1)
      19 => string 'f' (length=1)
      20 => string 'g' (length=1)
    

    But, with this code :

    header('Content-type: text/html; charset=UTF-8');  // So the browser doesn't make our lives harder
    $str = "abc 文字化け, efg";
    
    $results = array();
    preg_match_all('/./u', $str, $results);
    var_dump($results[0]);
    

    (Notice the 'u' at the end of the regex)

    You get what you want :

    array
      0 => string 'a' (length=1)
      1 => string 'b' (length=1)
      2 => string 'c' (length=1)
      3 => string ' ' (length=1)
      4 => string '文' (length=3)
      5 => string '字' (length=3)
      6 => string '化' (length=3)
      7 => string 'け' (length=3)
      8 => string ',' (length=1)
      9 => string ' ' (length=1)
      10 => string 'e' (length=1)
      11 => string 'f' (length=1)
      12 => string 'g' (length=1)
    

    Hope this helps :-)

提交回复
热议问题