Coverting unicode codeepoint format

北战南征 提交于 2019-12-24 21:10:46

问题


Let's say that I have a character string containing bytes representing an emoji:

string <- "This is a test. U+1F600"

How can I transform it into

string <- "This is a test. \U0001F600"

So that I can render it as

utf8_print("This is a test \U0001F600")
[1] "This is a test 😀​"

回答1:


This is kind of a hack, but it works for your case:

string <- c("This is a test. U+1F600", "Another test")

# change U+XXXXYYYY to \UXXXXYYYY, quote and encode special characters
expr <- gsub("U[+]([0-9A-Fa-f]{1,8})", "\\\\U\\1",
             encodeString(string, quote = '"'))

# evaluate the string as an R expression
vapply(parse(text = expr, keep.source = FALSE), eval, "")
#> [1] "This is a test. \U0001f600" "Another test"


来源:https://stackoverflow.com/questions/48105103/coverting-unicode-codeepoint-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!