explode() on Japanese string

女生的网名这么多〃 提交于 2019-12-01 21:38:42

That is for the simple reason that you do not have a space character here. You have an "IDEOGRAPHIC SPACE" character with the hex code "e3 80 80".

If you use that as your delimiter, it will work.

You're using the wrong space. The text uses full-width spaces (U+3000 IDEOGRAPHIC SPACE) and you're supplying a half-width space (U+0020 SPACE).

There're two issues here.

First of all, you don't say what your encoding is but I suppose all Japanese encodings are multi-byte. On the other side, the explode() function (like all regular PHP functions) expects single-byte input. There's no exact multi-byte equivalent but mb_split() could do the trick.

Secondly, you are exploding by regular space (U+0020) but your string contains another character (U+3000).

To sum up (and assuming you are using UTF-8):

<?php

mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

$string = '私 は イタリア 人 です';
print_r(mb_split(' ', $string));

... or even better:

<?php

mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

$string = '私 は イタリア 人 です';
print_r(mb_split('[[:space:]]', $string));

convert your string first using iconv() and then use it on explode. Convert to utf8

$string = explode(" ", iconv('', 'utf-8', $string));

There are a number of characters other than simple ASCII space that can add whitespace between characters.

You could try using preg_split using \s (whitespace characters) or \b (word boundaries) as the pattern, however this may not be ideal as Japanese is almost certainly going to be encoded in a multiple-byte format.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!