indic

Combining Devanagari characters

≯℡__Kan透↙ 提交于 2019-11-27 05:39:37
问题 I have something like a = "बिक्रम मेरो नाम हो" I want to achieve something like a[0] = बि a[1] = क्र a[3] = म but as म takes 4 bytes while बि takes 8 bytes I am not able to get to that straight. So what could be done to achieve that? In Python. 回答1: The algorithm for splitting text into grapheme clusters is given in Unicode Annex 29, section 3.1. I'm not going to implement the full algorithm for you here, but I'll show you roughly how to handle the case of Devanagari, and then you can read