Unicode Variable Names in R

≯℡__Kan透↙ 提交于 2021-02-19 06:35:45

问题


I was working on a toy project and tried using some unicode variable names to match a paper I was attempting to implement.

The following code works fine on R 3.4.3 on Windows (RStudio version 1.1.456) and R 3.5.1 on OSX:

> µ  <- function(ß,  n) ß  *  n 
> µ(2, 3)
[1] 6

This code gives the following error, with α typed as ALT+224:

> α <- 2
Error: unexpected input in "\"

The file was saved as UTF-8, so this is surprising to me.

make.names is consistent with the results above:

> make.names('µ')
[1] "µ"
> make.names('α')
[1] "a"

What is the rule for non-ASCII letters, why are mu and scharfes OK but alpha isn't?

Edit: Output of sessionInfo()

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3    yaml_2.2.0 

Edit2: It seems like Sys.setlocale should be the answer, but here is what happens when I try this:

> Sys.setlocale("LC_ALL", 'en_US.UTF-8')
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
  OS reports request to set locale to "en_US.UTF-8" cannot be honored

回答1:


Working with Ben Bolker we determined the issue was that the current session was using character encoding Windows-1252, which has some non-ASCII characters but not many. This is despite the fact that RStudio saved the file as UTF-8.

Attempting to change the current collation of a running R session does not seem to be possible? At least on Windows I get a warning (see the question and here).

I have a partial solution, if someone finds themselves in the situation where they are given a file like this and want to run it and have interactive access to the results, the following will mostly work (variables will be translated to Win-1252):

> source('utf-8-file.r', encoding='UTF-8')

I would be very excited to see a better solution, one which allows editing and running the file and entering such snippets into the console of RStudio on Windows.



来源:https://stackoverflow.com/questions/52020256/unicode-variable-names-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!