Convert unicode to readable characters in R

前端 未结 1 755
灰色年华
灰色年华 2020-12-10 20:10

I have a .csv where the encoding returns \"unknown\" and \"UTF-8\" when using Encoding(data). The text looks like this:

相关标签:
1条回答
  • 2020-12-10 20:14

    You could do something like this:

    library(stringi)
    
    string <- "<U+1042><U+1040><U+1042><U+1040> <U+1019><U+103D><U+102C>\n\n<U+1010><U+102D><U+102F><U+1004><U+1039><U+1038><U+103B><U+1015><U+100A><U+1039><U+1000><U+102D><U+102F><U+101C><U+1032> <U+1000><U+102C><U+1000><U+103C>" 
    
    cat(stri_unescape_unicode(gsub("<U\\+(....)>", "\\\\u\\1", string)))
    

    Which results in:

    ၂၀၂၀ မွာ

    တိုင္းျပည္ကိုလဲ ကာကြ

    0 讨论(0)
提交回复
热议问题