Are there packages for Cyrillic text transliteration to Latin in R? I need to convert data frames to Latin to use factors. It is somewhat messy to use Cyrillic factors in R.
I have found the package at last.
> library(stringi)
> stri_trans_general("женщина", "cyrillic-latin")
[1] "ženŝina"
> stri_trans_general("женщина", "russian-latin/bgn")
[1] "zhenshchina"
After that, the only issue remaining is the "ё" letter.
> stri_trans_general("Ёж", "russian-latin/bgn")
[1] "Yëzh"
I had to remove all the "ё" letters
> iconv(stri_trans_general("ёж", "russian-latin/bgn"),from="UTF8",to="ASCII",sub="")
[1] "yzh"
If afterwards one uses Base R to filter the data in Cyrillic, one get's all NA's, but if dplyr is used then everything is fine.
It is possible to do it with stringi package as you above, but with different transform identifier, for Serbian latin:
`stri_trans_general("жшчћђ", "Serbian-Latin/BGN")`
All characters should be transformed correctly to Serbian latin.
来源:https://stackoverflow.com/questions/48575399/cyrillic-transliteration-in-r