R tm package invalid input in 'utf8towcs'

前端 未结 14 1421
逝去的感伤
逝去的感伤 2020-11-29 01:47

I\'m trying to use the tm package in R to perform some text analysis. I tied the following:

require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <         


        
14条回答
  •  情书的邮戳
    2020-11-29 02:07

    I have been running this on Mac and to my frustration,I had to identify the foul record (as these were tweets) to resolve. Since the next time, there is no guarantee of the record being the same, I used the following function

    tm_map(yourCorpus, function(x) iconv(x, to='UTF-8-MAC', sub='byte'))
    

    as suggested above.

    It worked like a charm

提交回复
热议问题