What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode?

前端 未结 3 1014
旧时难觅i
旧时难觅i 2020-12-15 09:24

What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode?

They seem to do the same thing, as far as I can tell – although the set of

3条回答
  •  抹茶落季
    2020-12-15 10:04

    May I quote from Yannis Haralambous' Fonts and Encodings, page 116f.:

    The idea is that a script or a system of notation is sometimes too finely divided into characters. And when we have cut constructs up into characters, there is no way to put them back together again to rebuild larger characters. For example, Catalan has the ligature ‘ŀl’. This ligature is encoded as two Unicode characters: an ‘ŀ’ 0x0140 latin small letter l with middle dot and an ordinary ‘l’. But this division may not always be what we want.
    Suppose that we wish to place a circumflex accent over this ligature, as we might well wish to do with the ligatures ‘œ’ and ‘æ’. How can this be done in Unicode? To allow users to build up characters in constructs that play the rôle of new characters, Unicode introduced three new properties (grapheme base, grapheme extension, grapheme link) and one new character: 0x034F combining grapheme joiner.

    So the way I see it, this means that grapheme extenders are used to apply (for example) accents on characters that are themselves composed of several characters.

提交回复
热议问题