R on Windows: character encoding hell

后端 未结 5 758
旧巷少年郎
旧巷少年郎 2020-11-29 23:32

I am trying to import a CSV encoded as OEM-866 (Cyrillic charset) into R on Windows. I also have a copy that has been converted into UTF-8 w/o BOM. Both of these files are r

5条回答
  •  一生所求
    2020-11-30 00:03

    I think there are all great answers here and a lot of duplicates. I try to contribute with hopefully more complete problem description and the way I was using the above solutions.

    My situation - writing results of the Google Translate API to the file in R

    For my particular purpose I was sending text to Google API:

       # load library
       library(translateR)
    
       # return chinese tranlation
       result_chinese <- translate(content.vec = "This is my Text",
                                google.api.key = api_key, 
                                source.lang = "en",
                                target.lang = "zh-CN")
    

    The result I see in the R Environment is like this:

    However if I print my variable in Console I will see this nicely formatted (I hope) text:

    > print(result_chinese)
    [1] "这是我的文字"
    

    In my situation I had to write file to Computer File System using R function write.table()... but anything that I would write would be in the format:

    My Solution - taken from above answers:

    I decided to actually use function Sys.setlocale() like this:

    Sys.setlocale(locale = "Chinese") # set locale to Chinese
    
    > Sys.setlocale(locale = "Chinese") # set locale to Chinese
    [1] "LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936"
    

    After that my translation was properly visualized in the R Environment:

    # return chinese tranlation with new locale 
    result_chinese <- translate(content.vec = "This is my Text",
                                google.api.key = api_key, 
                                source.lang = "en",
                                target.lang = "zh-CN")
    

    The result in R Environment was:

    After that I could write my file and finally see chinese text:

    # writing 
    write.table(result_chinese, "translation.txt")
    

    Finally in my translating function I would return to my original settings with:

    Sys.setlocale() # to set up current locale to be default of the system
    
    > Sys.setlocale() # to set up current locale to be default of the system
    [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
    

    My conclusion:

    Before dealing with specific languages in R:

    1. Setup locale to the one from specific language Sys.setlocale(locale = "Chinese") # set locale to Chinese
    2. Perform all data manipulations
    3. Return to your original settings Sys.setlocale() # set original system settings

提交回复
热议问题