Swift - Replacing emojis in a string with whitespace

巧了我就是萌 提交于 2019-12-04 12:32:49

You can use pattern matching (for emoji patterns) to filter out emoji characters from your String.

extension String {

    var emojilessStringWithSubstitution: String {
        let emojiPatterns = [UnicodeScalar(0x1F601)...UnicodeScalar(0x1F64F),
                             UnicodeScalar(0x2702)...UnicodeScalar(0x27B0)]
        return self.unicodeScalars
            .filter { ucScalar in !(emojiPatterns.contains{ $0 ~= ucScalar }) }
            .reduce("") { $0 + String($1) }
    }  
}

/* example usage */
let str = "I'm gonna do this callenge as soon as I can swing again 😂😂😂\n http://youtu.be/SW_d3fGz1hk"
print(str.emojilessStringWithSubstitution)

/* I'm gonna do this callenge as soon as I can swing again
   http://youtu.be/SW_d3fGz1hk */

Note that the above only makes use of the emoji intervals as presented in your question, and is in no way representative for all emojis, but the method is general and can swiftly be extended by including additional emoji intervals to the emojiPatterns array.


I realize reading your question again that you'd prefer substituting emojis with whitespace characters, rather than removing them (which the above filtering solution does). We can achieve this by replacing the .filter operation above with a conditional return .map operation instead, much like in your question

extension String {

    var emojilessStringWithSubstitution: String {
        let emojiPatterns = [UnicodeScalar(0x1F600)...UnicodeScalar(0x1F64F),
                         UnicodeScalar(0x1F300)...UnicodeScalar(0x1F5FF),
                         UnicodeScalar(0x1F680)...UnicodeScalar(0x1F6FF),
                         UnicodeScalar(0x2600)...UnicodeScalar(0x26FF),
                         UnicodeScalar(0x2700)...UnicodeScalar(0x27BF),
                         UnicodeScalar(0xFE00)...UnicodeScalar(0xFE0F)]

        return self.unicodeScalars
            .map { ucScalar in
                emojiPatterns.contains{ $0 ~= ucScalar } ? UnicodeScalar(32) : ucScalar }
            .reduce("") { $0 + String($1) }
    }
}

I the above, the existing emoji intervals has been extended, as per your comment to this post (listing these intervals), such that the emoji check is now possibly exhaustive.

Swift 4:

extension String {
  func stringByRemovingEmoji() -> String {
    return String(self.filter { !$0.isEmoji() })
  }
}

extension Character {
  fileprivate func isEmoji() -> Bool {
    return Character(UnicodeScalar(UInt32(0x1d000))!) <= self && self <= Character(UnicodeScalar(UInt32(0x1f77f))!)
      || Character(UnicodeScalar(UInt32(0x2100))!) <= self && self <= Character(UnicodeScalar(UInt32(0x26ff))!)
  }
}

Emojis are classified as symbols by Unicode. Character sets are typically used in searching operations. So we will use Character sets a property that is symbols.

var emojiString =  "Hey there 🖐, welcome"
emojiString = emojiString.components(separatedBy: CharacterSet.symbols).joined()       
print(emojiString)

Output is

Hey there , welcome

Now observe the emoji is replaced by a white space so there is two white space and we replace it by the following way

emojiString.replacingOccurrences(of: "  ", with: " ") 

The above method replace parameter of: "two white space" to with: "single white space"

I found that the solutions given above did not work for certain characters such as 🏋️🏻‍♂️ and 🧰.

To find the emoji ranges, using regex I converted the full list of emoji characters to a file with just hex values. Then I converted them to decimal format and sorted them. Finally, I wrote a script to find the ranges.

Here is the final Swift extension for isEmoji().

extension Character {

    func isEmoji() -> Bool {
        let emojiRanges = [
            (8205, 11093),
            (12336, 12953),
            (65039, 65039),
            (126980, 129685)
        ]
        let codePoint = self.unicodeScalars[self.unicodeScalars.startIndex].value
        for emojiRange in emojiRanges {
            if codePoint >= emojiRange.0 && codePoint <= emojiRange.1 {
                return true
            }
        }
        return false
    }

}

For reference, here are the python scripts I wrote to parse the hex strings to integers and then find the ranges.

convert-hex-to-decimal.py

decimals = []
with open('hex.txt') as hexfile:
    for line in hexfile:
        num = int(line, 16)
        if num < 256:
            continue
        decimals.append(num)

decimals = list(set(decimals))
decimals.sort()

with open('decimal.txt', 'w') as decimalfile:
    for decimal in decimals:
        decimalfile.write(str(decimal) + "\n")

make-ranges.py

first_line = True
range_start = 0
prev = 0
with open('decimal.txt') as hexfile:
    for line in hexfile:
        if first_line: 
            prev = int(line)
            range_start = prev
            first_line = False
            continue

        curr = int(line)
        if prev + 1000 < curr: # 100 is abitrary to reduce number of ranges
            print("(" + str(range_start) + ", " + str(prev) + ")")
            range_start = curr
        prev = curr
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!