Using NSRegularExpression produces incorrect ranges when emoji are present [duplicate]

≯℡__Kan透↙ 提交于 2021-02-17 02:30:13

问题


I'm trying to parse out "@mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.

let text = "😂😘🙂 @joe "
let tagExpr = try? NSRegularExpression(pattern: "@\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
    guard let tag = tag?.range else { return }

    if let newRange = Range(tag, in: text) {
        let replaced = text.replacingCharacters(in: newRange, with: "[email]")
        print(replaced)
    }
}

When running this tag = (location: 7, length: 2)

And prints out 😂😘🙂 [email]oe

The expected result is 😂😘🙂 [email]


回答1:


NSRegularExpression (and anything involving NSRange) operates on UTF16 counts / indexes. For that matter, NSString.count is the UTF16 count as well.

But in your code, you're telling NSRegularExpression to use a length of text.characters.count. This is the number of composed characters, not the UTF16 count. Your string "😂😘🙂 @joe " has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe ".

The fix is to pass length: text.utf16.count.

let text = "😂😘🙂 @joe "
let tagExpr = try? NSRegularExpression(pattern: "@\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
    guard let tag = tag?.range else { return }

    if let newRange = Range(tag, in: text) {
        let replaced = text.replacingCharacters(in: newRange, with: "[email]")
        print(replaced)
    }
}


来源:https://stackoverflow.com/questions/46495365/using-nsregularexpression-produces-incorrect-ranges-when-emoji-are-present

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!