R Corpus Is Messing Up My UTF-8 Encoded Text

前端未结

关注

 3  1681

青春惊慌失措 2020-12-17 05:26

I am simply trying to create a corpus from Russian, UTF-8 encoded text. The problem is, the Corpus method from the tm package is not encoding the strings corre

3条回答

我在风中等你 (楼主)

2020-12-17 06:24

I had a problem with German UTF-8 encoding while importing the texts. For me, the next oneliner helped:

Sys.setlocale("LC_ALL", "de_DE.UTF-8")

Try to run the same with Russian?

Sys.setlocale("LC_ALL", "ru_RU.UTF-8")

Of course, that goes after library(tm) and before creating a corpus.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...