str_word_count() for non-latin words?

前端 未结 4 1157
我寻月下人不归
我寻月下人不归 2021-01-05 14:12

im trying to count the number of words in variable written in non-latin language (Bulgarian). But it seems that str_word_count() is not counting non-latin words. The encodin

4条回答
  •  情深已故
    2021-01-05 14:30

    You may do it with regex:

    $str = "текст на кирилица";
    echo 'Number of words: '.count(preg_split('/\s+/', $str));
    

    here I'm defining word delimiter as space characters. If there may be something else that will be treated as word delimiter, you'll need to add it into your regex.

    Also, note, that since there's no utf characters in regex (not in string) - /u modifier isn't required. But if you'll want some utf characters to act as delimiter, you'll need to add this regex modifier.

    Update:

    If you want only cyrillic letters to be treated in words, you may use:

    $str = "текст 
    на 12453
    кирилица";
    echo 'Number of words: '.count(preg_split('/[^А-Яа-яЁё]+/u', $str));
    

提交回复
热议问题