问题
I'm trying to get the day of the week, and have it work consistently in any locale. In locales with Latin alphabets, everything is fine.
Sys.getlocale()
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
weekdays(Sys.Date())
## [1] "Tuesday"
I have two related problems with other locales.
If I set
Sys.setlocale("LC_ALL", "Arabic_Qatar")
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
then I sometimes (correctly) get
weekdays(Sys.Date())
## [1] "الثلاثاء
and sometimes get
weekdays(Sys.Date())
## [1] "ÇáËáÇËÇÁ"
depending upon my setup. The problem is, I can't figure out what is causing the difference.
I thought it might be something to do with getOption("encoding")
, but I've tried explicitly setting options(encoding = "native.enc")
and options(encoding = "UTF-8")
and it makes no difference.
I've tried several recent versions of R, and the problem is consistent across all of them.
At the moment, the string displays correctly in R GUI, but incorrectly when I use an IDE (Architect and RStudio tested).
What should I set to ensure that weekdays always displays correctly?
It may be helpful to know that weekdays(Sys.Date())
is equivalent to format(as.POSIXlt(Sys.Date()), "%A")
, which calls an internal format.POSIXlt
method.
Secondly, it seems overkill to change all of the locale. I thought I should just be able to set the time options. However, if I set individual components of the locale, weekdays
returns a string of question marks.
for(category in c("LC_TIME", "LC_CTYPE", "LC_COLLATE", "LC_MONETARY"))
{
Sys.setlocale(category, "Arabic_Qatar")
print(Sys.getlocale())
print(weekdays(Sys.Date()))
}
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
What parts of the locale affect how the weekdays are printed?
Update: The problem seems to be Windows-related. When I run the code on a Linux box with locale "ar_QA.UTF8"
, the weekdays are correctly displayed.
Further update: As agstudy mentioned in his answer, setting locales under Windows is odd, since you can't just use ISO codes like "en-GB". For Windows 7/Vista/Server 2003/XP you can set a locale using setlocale language strings or National Language Support values. For Qatari Arabic, there is no setlocale language string, so we must use an NLS value. We have several choices:
Sys.setlocale("LC_TIME", "ARQ") # the language abbreviation name
Sys.setlocale("LC_TIME", "Arabic_Qatar") # corresponding to the language/country pair "Arabic (Qatar)"
Sys.setlocale("LC_TIME", "Arabic_Qatar.1256") # explicitly including the ANSI codepage
Sys.setlocale("LC_TIME", "Arabic") # would sometimes be a possibility too, but it defaults to Saudi Arabic
So the problem isn't that R cannot support Arabic locales under Windows (though I'm not entirely convinced of the robustness of Sys.setlocale
).
Desperate last ditch attempt: Trying to magically fix things by using Windows Management Instrumentation Command to change the OS locale doesn't work, since R doesn't appear to recognise the changes.
system("wmic os set locale=MS_4001")
## Updating property(s) of '\\PC402729\ROOT\CIMV2:Win32_OperatingSystem=@'
## Property(s) update successful.
system("wmic os get locale") # same as before
回答1:
The system of naming locales is OS-specific. I recommend you to read the locales from R Installation and Administration manual for a complete explanation.
under windows :
The list of supported language is listed MSDN Language Strings. And surprisingly there is not Arabic language there. The "Language string" column contains the legal input for setting locale in R and even in the list contry /regions strings there no country spoken arabic there.
Of course you can change your locale global settings( panel setting --> region --> ..) but this will change it globally and it is not sure to get the right output without encoding problem.
under linux(ubuntu in my case):
Arabic is generally not supported by default, but is easy to set it using locale
.
locale -a ## to list all already supported language
sudo locale-gen ar_QA.UTF-8 ## install it in case does not exist
under RStudio now :
Sys.setlocale('LC_TIME','ar_QA.UTF-8')
[1] "ar_QA.UTF-8"
> format(Sys.Date(),'%A')
[1] "الثلاثاء
Note also that under R console the printing is not as pretty as in R studio because it is written from left to right not from right to left.
回答2:
The RStudio/Architect problem
This can be solved, slightly messily, by explicitly changing the encoding of the weekdays string to UTF-8.
current_codepage <- as.character(l10n_info()$codepage)
iconv(weekdays(Sys.Date()), from = current_codepage, to = "utf8")
Note that codepages only exist on Windows; l10n_info()$codepage
is NULL
on Linux.
The LC_TIME problem
It turns out that under Windows you have to set both the LC_CTYPE
and LC_TIME
locale categories, and you have to set LC_CTYPE
before LC_TIME
, or it won't work.
In the end, we need different implementations for different OSes.
Windows version:
get_today_windows <- function(locale = NULL)
{
if(!is.null(locale))
{
lc_ctype <- Sys.getlocale("LC_CTYPE")
lc_time <- Sys.getlocale("LC_TIME")
on.exit(Sys.setlocale("LC_CTYPE", lc_ctype))
on.exit(Sys.setlocale("LC_TIME", lc_time), add = TRUE)
Sys.setlocale("LC_CTYPE", locale)
Sys.setlocale("LC_TIME", locale)
}
today <- weekdays(Sys.Date())
current_codepage <- as.character(l10n_info()$codepage)
iconv(today, from = current_codepage, to = "utf8")
}
get_today_windows()
## [1] "Tuesday"
get_today_windows("French_France")
## [1] "mardi"
get_today_windows("Arabic_Qatar")
## [1] "الثلاثاء"
get_today_windows("Serbian (Cyrillic)")
## [1] "уторак"
get_today_windows("Chinese (Traditional)_Taiwan")
## [1] "星期二"
Linux version:
get_today_linux <- function(locale = NULL)
{
if(!is.null(locale))
{
lc_time <- Sys.getlocale("LC_TIME")
on.exit(Sys.setlocale("LC_TIME", lc_time), add = TRUE)
Sys.setlocale("LC_TIME", locale)
}
weekdays(Sys.Date())
}
get_today_linux()
## [1] "Tuesday"
get_today_linux("fr_FR.utf8")
## [1] "mardi"
get_today_linux("ar_QA.utf8")
## [1] "الثلاثاء"
get_today_linux("sr_RS.utf8")
## [1] "уторак"
get_today_linux("zh_TW.utf8")
## [1] "週二"
Enforcing the .utf8
encoding in the locale seems important get_today_linux("zh_TW")
doesn't display properly.
来源:https://stackoverflow.com/questions/26603564/using-weekdays-with-any-locale-under-windows