R JSON UTF-8 parsing

社会主义新天地 提交于 2019-12-30 13:56:46

问题


I have an issue when trying to parse a JSON file in russian alphabet in R. The file looks like this:

[{"text": "Валера!", "type": "status"}, {"text": "когда выйдет", "type": "status"}, {"text": "КАК ДЕЛА?!)", "type": "status"}]

and it is saved in UTF-8 encoding. I tried libraries rjson, RJSONIO and jsonlite to parse it, but it doesn't work:

library(jsonlite)
allFiles <- fromJSON(txt="ru_json_example_short.txt")

gives me error

Error in feed_push_parser(buf) : 
  lexical error: invalid char in json text.
                                       [{"text": "Валера!", "
                     (right here) ------^

When I save the file in ANSI encodieng, it works OK, but then, the Russian alphabet transforms into question marks, so the output is unusable. Does anyone know how to parse such JSON file in R, please?

Edit: Above mentioned applies for UTF-8 file saved in Windows Notepad. When I save it in PSPad and the parse it, the result looks like this:

    text   type
1                                         <U+0412><U+0430><U+043B><U+0435><U+0440><U+0430>! status
2 <U+043A><U+043E><U+0433><U+0434><U+0430> <U+0432><U+044B><U+0439><U+0434><U+0435><U+0442> status
3                              <U+041A><U+0410><U+041A> <U+0414><U+0415><U+041B><U+0410>?!) status

回答1:


Try the following:

dat <- fromJSON(sprintf("[%s]",
                paste(readLines("./ru_json_example_short.txt"),
                collapse=",")))
dat
[[1]]
       text   type
1      Валера! status
2 когда выйдет status
3  КАК ДЕЛА?!) status

ref: Error parsing JSON file with the jsonlite package



来源:https://stackoverflow.com/questions/30172601/r-json-utf-8-parsing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!