R tm package invalid input in 'utf8towcs'

前端 未结 14 1388
逝去的感伤
逝去的感伤 2020-11-29 01:47

I\'m trying to use the tm package in R to perform some text analysis. I tied the following:

require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <         


        
14条回答
  •  独厮守ぢ
    2020-11-29 01:56

    I have often run into this issue and this Stack Overflow post is always what comes up first. I have used the top solution before, but it can strip out characters and replace them with garbage (like converting it’s to it’s).

    I have found that there is actually a much better solution for this! If you install the stringi package, you can replace tolower() with stri_trans_tolower() and then everything should work fine.

提交回复
热议问题