I am simply trying to create a corpus from Russian, UTF-8 encoded text. The problem is, the Corpus method from the tm package is not encoding the strings corre
tm
I had a problem with German UTF-8 encoding while importing the texts. For me, the next oneliner helped:
Sys.setlocale("LC_ALL", "de_DE.UTF-8")
Try to run the same with Russian?
Sys.setlocale("LC_ALL", "ru_RU.UTF-8")
Of course, that goes after library(tm) and before creating a corpus.