R Corpus Is Messing Up My UTF-8 Encoded Text

前端 未结 3 1681
青春惊慌失措
青春惊慌失措 2020-12-17 05:26

I am simply trying to create a corpus from Russian, UTF-8 encoded text. The problem is, the Corpus method from the tm package is not encoding the strings corre

3条回答
  •  我在风中等你
    2020-12-17 06:24

    I had a problem with German UTF-8 encoding while importing the texts. For me, the next oneliner helped:

    Sys.setlocale("LC_ALL", "de_DE.UTF-8")

    Try to run the same with Russian?

    Sys.setlocale("LC_ALL", "ru_RU.UTF-8")

    Of course, that goes after library(tm) and before creating a corpus.

提交回复
热议问题