Swift countElements() return incorrect value when count flag emoji

梦想与她 提交于 2019-11-27 01:37:12

Update for Swift 4 (Xcode 9)

As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9 standard:

let str1 = "🇩🇪🇩🇪🇩🇪🇩🇪🇩🇪"
print(str1.count) // 5
print(Array(str1)) // ["🇩🇪", "🇩🇪", "🇩🇪", "🇩🇪", "🇩🇪"]

Also String is a collection of its characters (again), so one can obtain the character count with str1.count.


(Old answer for Swift 3 and older:)

From "3 Grapheme Cluster Boundaries" in the "Standard Annex #29 UNICODE TEXT SEGMENTATION": (emphasis added):

A legacy grapheme cluster is defined as a base (such as A or カ) followed by zero or more continuing characters. One way to think of this is as a sequence of characters that form a “stack”.

The base can be single characters, or be any sequence of Hangul Jamo characters that form a Hangul Syllable, as defined by D133 in The Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji national flag symbols corresponding to ISO country codes. Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.

(Thanks to @rintaro for the link).

A Swift Character represents an extended grapheme cluster, so it is (according to this reference) correct that any sequence of regional indicator symbols is counted as a single character.

You can separate the "flags" by a ZERO WIDTH NON-JOINER:

let str1 = "🇩🇪\u{200C}🇩🇪"
print(str1.characters.count) // 2

or insert a ZERO WIDTH SPACE:

let str2 = "🇩🇪\u{200B}🇩🇪"
print(str2.characters.count) // 3

This solves also possible ambiguities, e.g. should "🇫​🇷​🇺​🇸" be "🇫​🇷🇺​🇸" or "🇫🇷​🇺🇸" ?

See also How to know if two emojis will be displayed as one emoji? about a possible method to count the number of "composed characters" in a Swift string, which would return 5 for your let str1 = "🇩🇪🇩🇪🇩🇪🇩🇪🇩🇪".

Here's how I solved that problem, for Swift 3:

let str = "🇩🇪🇩🇪🇩🇪🇩🇪🇩🇪" //or whatever the string of emojis is
let range = str.startIndex..<str.endIndex
var length = 0
str.enumerateSubstrings(in: range, options: NSString.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, stop) -> () in
        length = length + 1
    }
print("Character Count: \(length)")

This fixes all the problems with character count and emojis, and is the simplest method I have found.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!