How to check if the word is Japanese or English using PHP

爷,独闯天下 提交于 2019-12-02 15:59:49
Alix Axel

A quick solution that doesn't need the mb_string extension:

if (strlen($str) != strlen(utf8_decode($str))) {
    // $str uses multi-byte chars (isn't English)
}

else {
    // $str is ASCII (probably English)
}

Or a modification of the solution provided by @Alexander Konstantinov:

function isKanji($str) {
    return preg_match('/[\x{4E00}-\x{9FBF}]/u', $str) > 0;
}

function isHiragana($str) {
    return preg_match('/[\x{3040}-\x{309F}]/u', $str) > 0;
}

function isKatakana($str) {
    return preg_match('/[\x{30A0}-\x{30FF}]/u', $str) > 0;
}

function isJapanese($str) {
    return isKanji($str) || isHiragana($str) || isKatakana($str);
}

This function checks whether a word contains at least one Japanese letter (I found unicode range for Japanese letters in Wikipedia).

function isJapanese($word) {
    return preg_match('/[\x{4E00}-\x{9FBF}\x{3040}-\x{309F}\x{30A0}-\x{30FF}]/u', $word);
}
Alec

You could try Google's Translation API that has a detection function: http://code.google.com/apis/language/translate/v2/using_rest.html#detect-language

Try with mb_detect_encoding function, if encoding is EUC-JP or UTF-8 / UTF-16 it can be japanese, otherwise english. The better is if you can ensure which encoding each language, as UTF encodings can be used for many languages

English text usually consists only of ASCII characters (or better say, characters in ASCII range).

You can try to convert the charset and check if it succeeds.

Take a look at iconv: http://www.php.net/manual/en/function.iconv.php

If you can convert a string to ISO-8859-1 it might be english, if you can convert to iso-2022-jp it is propably japanese (I might be wrong for the exact charsets, you should google for them).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!