R tm package invalid input in 'utf8towcs'

前端 未结 14 1414
逝去的感伤
逝去的感伤 2020-11-29 01:47

I\'m trying to use the tm package in R to perform some text analysis. I tied the following:

require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <         


        
14条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-29 02:16

    This is from the tm faq:

    it will replace non-convertible bytes in yourCorpus with strings showing their hex codes.

    I hope this helps, for me it does.

    tm_map(yourCorpus, function(x) iconv(enc2utf8(x), sub = "byte"))
    

    http://tm.r-forge.r-project.org/faq.html

提交回复
热议问题