Check the language of string based on glyphs in PHP

前提是你 提交于 2019-12-17 19:26:03

问题


I have a MySQL database with book titles in both English and Arabic and I'm using a PHP class that can automatically transliterate Arabic text into Latin script.

I'd like my output HTML to look something like this:

<h3>A book</h3>
<h3>كتاب <em>(kitaab)</em></h3>
<h3>Another book</h3>

Is there a way for PHP to determine the language of a string based on the Unicode characters and glyphs used in it? I'm trying to get something like this:

$Ar = new Arabic('EnTransliteration');
while ($item = mysql_fetch_array($results)) {
    ...
    if (some test to see if $item['item_title'] has Arabic glyphs in it) {
      echo "<h3>$item[item_title] <em>(" . $Ar->ar2en($item['item_title']) . ")</em></h3>";
    } else {
      echo "<h3>$item[item_title]</h3>";
    }
    ...
}

Fortunately the class doesn't choke when fed Latin characters, so in theory I could send every result through the transformation, but that seems like a waste of processing.

Thanks!

Edit: I still haven't found a way to check for glyphs or characters. I suppose I could put all the Arabic characters in an array and check if anything in the array matches a part of the string...

I did, however, figure out an interim solution that might work fine in the end. It puts every title through the transformation regardless of language, but only outputs the parenthetical transliteration if the string was changed:

while ($item = mysql_fetch_array($mysql_results)) {
    $transliterate = trim(strtolower($Ar->ar2en($item['item_title'])));
    $item_title = (strtolower($item['item_title']) == $transliterate) ? $item['item_title'] : $item['item_title'] . " <em>($transliterate)</em>";

    echo "<h3>$item_title</h3>";
}

回答1:


This should do it:

preg_match("/\p{Arabic}/u", $item['item_title'])

You could make that regular expression a bit more sophisticated if you want to, but I don't think you really need to.

The \p escape sequence lets you select characters based on their Unicode properties (when the u pattern modifier is used).

The PHP manual mentions: "Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE." But that's not entirely true anymore. PCRE release 6.5 added support for script names.




回答2:


Here's an PHP open source class for Arabic character set auto detection:

http://www.ar-php.com/php/arabic/index.html#ArCharsetD



来源:https://stackoverflow.com/questions/1011841/check-the-language-of-string-based-on-glyphs-in-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!