check if is multibyte string in PHP

后端 未结 3 683
庸人自扰
庸人自扰 2020-12-30 12:23

I want to check if is a string type multibyte on PHP. Have any idea how to accomplish this?

Example:



        
相关标签:
3条回答
  • 2020-12-30 12:57

    There are two interpretations. The first is that every character is multibyte. The second is that the string contains one multibyte character at least. If you have an interest for handling invalid byte sequence, see https://stackoverflow.com/a/13695364/531320 for details.

    function is_all_multibyte($string)
    {
        // check if the string doesn't contain invalid byte sequence
        if (mb_check_encoding($string, 'UTF-8') === false) return false;
    
        $length = mb_strlen($string, 'UTF-8');
    
        for ($i = 0; $i < $length; $i += 1) {
    
            $char = mb_substr($string, $i, 1, 'UTF-8');
    
            // check if the string doesn't contain single character
            if (mb_check_encoding($char, 'ASCII')) {
    
                return false;
    
            }
    
        }
    
        return true;
    
    }
    
    function contains_any_multibyte($string)
    {
        return !mb_check_encoding($string, 'ASCII') && mb_check_encoding($string, 'UTF-8');
    }
    
    $data = ['東京', 'Tokyo', '東京(Tokyo)'];
    
    var_dump(
        [true, false, false] ===
        array_map(function($v) {
            return is_all_multibyte($v);
        },
        $data),
        [true, false, true] ===
        array_map(function($v) {
            return contains_any_multibyte($v);
        },
        $data)
    );
    
    0 讨论(0)
  • 2020-12-30 12:59

    I'm not sure if there's a better way, but a quick way that comes in mind is:

    if (mb_strlen($str) != strlen($str)) {
        echo "yes";
    } else {
        echo "no";
    }
    
    0 讨论(0)
  • 2020-12-30 13:00

    To determine if something is multibyte or not you need to be specific about which character set you're using. If your character set is Latin1, for example, no strings will be multibyte. If your character set is UTF-16, every string is multibyte.

    That said, if you only care about a specific character set, say utf-8, you can use a mb_strlen < strlen test if you specify the encoding parameter explicitly.

    function is_multibyte($s) {
      return mb_strlen($s,'utf-8') < strlen($s);
    }
    
    0 讨论(0)
提交回复
热议问题