R tm package invalid input in 'utf8towcs'

前端 未结 14 1420
逝去的感伤
逝去的感伤 2020-11-29 01:47

I\'m trying to use the tm package in R to perform some text analysis. I tied the following:

require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <         


        
14条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-29 02:03

    None of the above answers worked for me. The only way to work around this problem was to remove all non graphical characters (http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html).

    The code is this simple

    usableText=str_replace_all(tweets$text,"[^[:graph:]]", " ") 
    

提交回复
热议问题