I have to use the explode() function on Japanese text but it doesn't work.
Here is an example of what I have
$string = '私 は イタリア 人 です';
$string = explode(" ", $string);
print_r($string);
That prints
Array ( [0] => 私 は イタリア 人 です )
in place of
Array ( [0] => 私 [1] => は [2] => イタリア [3] => 人 [4] => です )
It seems that explode()
can't recognize the spaces inside that text.
What's the reason? How could I make it work?
That is for the simple reason that you do not have a space character here. You have an "IDEOGRAPHIC SPACE" character with the hex code "e3 80 80".
If you use that as your delimiter, it will work.
You're using the wrong space. The text uses full-width spaces (U+3000 IDEOGRAPHIC SPACE) and you're supplying a half-width space (U+0020 SPACE).
There're two issues here.
First of all, you don't say what your encoding is but I suppose all Japanese encodings are multi-byte. On the other side, the explode()
function (like all regular PHP functions) expects single-byte input. There's no exact multi-byte equivalent but mb_split() could do the trick.
Secondly, you are exploding by regular space (U+0020) but your string contains another character (U+3000).
To sum up (and assuming you are using UTF-8):
<?php
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
$string = '私 は イタリア 人 です';
print_r(mb_split(' ', $string));
... or even better:
<?php
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
$string = '私 は イタリア 人 です';
print_r(mb_split('[[:space:]]', $string));
convert your string first using iconv()
and then use it on explode. Convert to utf8
$string = explode(" ", iconv('', 'utf-8', $string));
There are a number of characters other than simple ASCII space that can add whitespace between characters.
You could try using preg_split using \s (whitespace characters) or \b (word boundaries) as the pattern, however this may not be ideal as Japanese is almost certainly going to be encoded in a multiple-byte format.
来源:https://stackoverflow.com/questions/17443605/explode-on-japanese-string