Remove Arabic Diacritic

*爱你&永不变心* 提交于 2019-12-03 09:06:22

The vowel diacritics in Arabic are combining characters, meaning that a simple search for these should suffice. There's no need to have a replace rule for every possible consonant with every possible vowel, which is a little tedious.

Here's a working example that outputs what you need:

header('Content-Type: text/html; charset=utf-8', true);
$string = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ';

$remove = array('ِ', 'ُ', 'ٓ', 'ٰ', 'ْ', 'ٌ', 'ٍ', 'ً', 'ّ', 'َ');
$string = str_replace($remove, '', $string);

echo $string; // outputs الحمد لله رب العالمين

What's important here is the $remove array. It looks weird because there's a combining character between the ' quotes, so it modifies one of those single quotes. This might need saving in the same character encoding as your text is.

try this:

$string = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ';
$string = preg_replace("~[\x{064B}-\x{065B}]~u", "", $string);
echo $string; // outputs الحمد لله رب العالمين

I'm not Arabic speaking, but i think you can make some alphabet remap:

function remap($string) {
    $remap = [
        'ą' => 'a',
        'č' => 'c',
        /* ... Arabic alphabet remap */
    ];
    return str_replace(array_keys($remap), $remap, $string);
}

echo remap('ąčasdadfg'); // => acasdadfg

Try this code, it's works fine:

$unicode = [
            "~[\x{0600}-\x{061F}]~u",   
            "~[\x{063B}-\x{063F}]~u",   
            "~[\x{064B}-\x{065E}]~u",   
            "~[\x{066A}-\x{06FF}]~u",   
        ];

$str = preg_replace($unicode, "", $str); 

Arabic unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!