explode() on Japanese string

丶灬走出姿态 提交于 2019-12-01 22:21:06

问题


I have to use the explode() function on Japanese text but it doesn't work.

Here is an example of what I have

$string = '私 は イタリア 人 です';
$string = explode(" ", $string);
print_r($string);

That prints

Array ( [0] => 私 は イタリア 人 です )

in place of

Array ( [0] => 私 [1] => は [2] => イタリア [3] => 人 [4] => です )

It seems that explode() can't recognize the spaces inside that text.

What's the reason? How could I make it work?


回答1:


That is for the simple reason that you do not have a space character here. You have an "IDEOGRAPHIC SPACE" character with the hex code "e3 80 80".

If you use that as your delimiter, it will work.




回答2:


You're using the wrong space. The text uses full-width spaces (U+3000 IDEOGRAPHIC SPACE) and you're supplying a half-width space (U+0020 SPACE).




回答3:


There're two issues here.

First of all, you don't say what your encoding is but I suppose all Japanese encodings are multi-byte. On the other side, the explode() function (like all regular PHP functions) expects single-byte input. There's no exact multi-byte equivalent but mb_split() could do the trick.

Secondly, you are exploding by regular space (U+0020) but your string contains another character (U+3000).

To sum up (and assuming you are using UTF-8):

<?php

mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

$string = '私 は イタリア 人 です';
print_r(mb_split(' ', $string));

... or even better:

<?php

mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

$string = '私 は イタリア 人 です';
print_r(mb_split('[[:space:]]', $string));



回答4:


convert your string first using iconv() and then use it on explode. Convert to utf8

$string = explode(" ", iconv('', 'utf-8', $string));



回答5:


There are a number of characters other than simple ASCII space that can add whitespace between characters.

You could try using preg_split using \s (whitespace characters) or \b (word boundaries) as the pattern, however this may not be ideal as Japanese is almost certainly going to be encoded in a multiple-byte format.



来源:https://stackoverflow.com/questions/17443605/explode-on-japanese-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!