How to determine a string is English or Persian?

后端 未结 5 1040
南笙
南笙 2021-01-11 17:45

I have edittext in a form, I want that when the user inputs text into the edittext for my program to detect which language was inserted into the edittext.

Is there a

5条回答
  •  我在风中等你
    2021-01-11 18:33

    Using characters' range is not a perfect way to detect some languages that have overlapped range e.g Arabic, Persian and Urdu. But, if you insist on this way, my suggestion is looking for especial characters that are language-specific. For example, گ or پ are in Persian but are not in Arabic. On the other hand, ئ or ة maybe more common in Arabic text than Persian. By counting these specific characters you can distinguish between Arabic, Persian and Urdu.

    Although I've got good results from the mentioned method, using n-grams to detect a language is more popular and dependable. There are many libraries that do language detection task by this method.

提交回复
热议问题