Working with emoji in R

怎甘沉沦 提交于 2019-12-12 13:52:55

问题


I have a csv file that contains a lot of emoji:

Person, Message,
A, 😉,
A, How are you?,
B, 🙍 Alright!,
A, 💃💃

How can I read.csv() into R so that the emoji don't become black ?s

(I want to track emoji usage over time 👽)


回答1:


My console has a font that accepts those "characters":

  txt <- "Person, Message,
 A, 😉,
 A, How are you?,
 B, 🙍 Alright!,
 A, 💃💃"

 Encoding(txt)
#[1] "UTF-8"
 dput(txt)
#"Person, Message,\nA, \U0001f609,\nA, How are you?,\nB, \U0001f64d Alright!,\nA, \U0001f483\U0001f483"

> tvec <- scan(text=txt, what="")
Read 13 items
> dput(tvec)
c("Person,", "Message,", "A,", "\U0001f609,", "A,", "How", "are", 
"you?,", "B,", "\U0001f64d", "Alright!,", "A,", "\U0001f483\U0001f483"
)

> which(tvec == '\U0001f609,')
[1] 4

When I used scan to read that text using a comma sep, then the leading space prevented the equality test from succeeding, but it succeeded if I used the two character version:

> which(tvec == '\U0001f609')
integer(0)
> dput(tvec)
c("Person", " Message", "", "A", " \U0001f609", "", "A", " How are you?", 
"", "B", " \U0001f64d Alright!", "", "A", " \U0001f483\U0001f483"
)
> which(tvec == " 😉")
[1] 5

This is with Courier New as the console/editor font on a Mac. To see the explanation for Unicode representations look at ?Quotes {base}.



来源:https://stackoverflow.com/questions/35328416/working-with-emoji-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!