grapheme

Why is Swift counting this Grapheme Cluster as two characters instead of one?

此生再无相见时 提交于 2019-12-23 13:08:37
问题 Generally Swift is really smart about counting grapheme clusters as a single character. If I want to make a Lebanese flag, for example, I can combine the two Unicode characters U+1F1F1 REGIONAL INDICATOR SYMBOL LETTER L U+1F1E7 REGIONAL INDICATOR SYMBOL LETTER B and as expected this is one character in Swift: let s = "\u{1f1f1}\u{1f1e7}" assert(s.characters.count == 1) assert(s.utf16.count == 4) assert(s.utf8.count == 8) However, let's say I want to make a Bicyclist emoji of Fitzpatrick Type

How to count grapheme clusters or “perceived” emoji characters in Java

只谈情不闲聊 提交于 2019-12-22 05:25:21
问题 I'm looking to count the number of perceived emoji characters in a provided Java string. I'm currently using the emoji4j library, but it doesn't work for grapheme clusters like this one: 👩‍👩‍👦‍👦 Calling EmojiUtil.getLength("👩‍👩‍👦‍👦") returns 4 instead of 1 , and similarly calling EmojiUtil.getLength("👻👩‍👩‍👦‍👦") returns 5 instead of 2 . Are there any APIs or methods on String in Java that make it easy to count grapheme clusters? I've been hunting around but understandably the codePoints()

Get grapheme character count in javascript strings?

只谈情不闲聊 提交于 2019-12-07 11:30:22
问题 I'm trying to get the length of a javascript string in user-visible graphemes, ie ignoring combining characters (and surrogate pairs?). Is this possible, and if so, how would I go about it? We're using the dojo toolkit on our project, but any general javascript solution would be great. 回答1: For the combining characters, look at the Derived Combining Class that lists all combining characters (among others). Since you're just interested in counting, you could just nuke them out -- leaves you

Get grapheme character count in javascript strings?

牧云@^-^@ 提交于 2019-12-05 12:32:06
I'm trying to get the length of a javascript string in user-visible graphemes, ie ignoring combining characters (and surrogate pairs?). Is this possible, and if so, how would I go about it? We're using the dojo toolkit on our project, but any general javascript solution would be great. dda For the combining characters, look at the Derived Combining Class that lists all combining characters (among others). Since you're just interested in counting, you could just nuke them out -- leaves you with a slightly closer estimation. In the post linked to by Angus, JavaScript strings outside of the BMP

regular expression to match name initials - PCRE

懵懂的女人 提交于 2019-12-02 12:24:40
问题 I have a regular expression to get the initials of a name like below: /\b\p{L}\./gu it works fine with English and other languages until there are graphemes and combined charecters occur. Like क in Hindi and ಕ in Kannada are being matched But, के this one in Hindi, ಕೆ this one in Kannada are notmatched with this regex. I am trying to get the initials from a name like J.P.Morgan, etc. Any help would be greatly appreciated. 回答1: You need to match diacritic marks after base letters using \p{M}*

regular expression to match name initials - PCRE

感情迁移 提交于 2019-12-02 06:33:53
I have a regular expression to get the initials of a name like below: /\b\p{L}\./gu it works fine with English and other languages until there are graphemes and combined charecters occur. Like क in Hindi and ಕ in Kannada are being matched But, के this one in Hindi, ಕೆ this one in Kannada are notmatched with this regex. I am trying to get the initials from a name like J.P.Morgan, etc. Any help would be greatly appreciated. You need to match diacritic marks after base letters using \p{M}* : '~\b(?<!\p{M})\p{L}\p{M}*\.~u' The pattern matches \b - a word boundary (?<!\p{M}) - the char before the

How to count grapheme clusters or “perceived” emoji characters in Java

人盡茶涼 提交于 2019-12-01 05:28:44
I'm looking to count the number of perceived emoji characters in a provided Java string. I'm currently using the emoji4j library, but it doesn't work for grapheme clusters like this one: 👩‍👩‍👦‍👦 Calling EmojiUtil.getLength("👩‍👩‍👦‍👦") returns 4 instead of 1 , and similarly calling EmojiUtil.getLength("👻👩‍👩‍👦‍👦") returns 5 instead of 2 . Are there any APIs or methods on String in Java that make it easy to count grapheme clusters? I've been hunting around but understandably the codePoints() method on a String includes not only the visible emojis, but also the zero width joiners. I also attempted

What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode?

冷暖自知 提交于 2019-11-29 07:11:32
问题 What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode? They seem to do the same thing, as far as I can tell – although the set of grapheme extenders is larger than the set of combining characters. I’m clearly missing something here. Why the distinction? The Unicode Standard, Chapter 3, D52 Combining character: A character with the General Category of Combining Mark (M). Combining characters consist of all characters with the General Category values of