javascript+remove arabic text diacritic dynamically

前端 未结 5 511
栀梦
栀梦 2020-12-29 11:51

how to remove dynamically Arabic diacritic I\'m designing an ebook \"chm\" and have multi html pages contain Arabic text but some time the search engine want highlight so

5条回答
  •  無奈伤痛
    2020-12-29 12:51

    Try this

    Text : الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
    converted to : الحمد لله رب العالمين 
    

    http://www.suhailkaleem.com/2009/08/26/remove-diacritics-from-arabic-text-quran/

    The code is C# not javascript though. Still trying to figure out how to achieve this in javascript

    EDIT: Apparently it's very easy in javascript. The diacratics are stored as separate "letters" and they can be removed quite easily.

    var CHARCODE_SHADDA = 1617;
    var CHARCODE_SUKOON = 1618;
    var CHARCODE_SUPERSCRIPT_ALIF = 1648;
    var CHARCODE_TATWEEL = 1600;
    var CHARCODE_ALIF = 1575;
    
    function isCharTashkeel(letter)
    {
        if (typeof(letter) == "undefined" || letter == null)
            return false;
    
        var code = letter.charCodeAt(0);
        //1648 - superscript alif
        //1619 - madd: ~
        return (code == CHARCODE_TATWEEL || code == CHARCODE_SUPERSCRIPT_ALIF || code >= 1612 && code <= 1631); //tashkeel
    }
    
    function stripTashkeel(input)
    {
      var output = "";
      //todo consider using a stringbuilder to improve performance
      for (var i = 0; i < input.length; i++)
      {
        var letter = input.charAt(i);
        if (!isCharTashkeel(letter)) //tashkeel
          output += letter;                                
      }
    
    
    return output;                   
    }
    

    Edit: Here is another way to do it using BuckData http://qurandev.github.com/

    Advantages Buck uses less bandwidth In Javascript, u can search thru entire Buck quran text in 1 shot. intuitive compared to Arabic search Buck to Arabic and Arabic to Buck is a simple js call. Play with live sample here: http://jsfiddle.net/BrxJP/ You can strip out all vowels from Buck text in few millisecs. Why do this? u can search in javascript, ignoring the taskheel differences (Fathah, Dammah, Kasrah). Which leads to more hits. Regex + buck text can lead to awesome optimizations. All the searches can be run locally. http://qurandev.appspot.com How data generated? just one-to-one mapping using: http://corpus.quran.com/java/buckwalter.jsp

提交回复
热议问题