Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

后端 未结 12 840
故里飘歌
故里飘歌 2020-11-22 11:42

I am looking at an algorithm that can map between characters with diacritics (tilde, circumflex, caret, umlaut, caron) and their \"simple\" character.

For example:

12条回答
  •  野趣味
    野趣味 (楼主)
    2020-11-22 11:49

    In case of German it's not wanted to remove diacritics from Umlauts (ä, ö, ü). Instead they are replaced by two letter combination (ae, oe, ue) For instance, Björn should be written as Bjoern (not Bjorn) to have correct pronounciation.

    For that I would have rather a hardcoded mapping, where you can define the replacement rule individually for each special character group.

提交回复
热议问题