R tm package invalid input in 'utf8towcs'

前端 未结 14 1390
逝去的感伤
逝去的感伤 2020-11-29 01:47

I\'m trying to use the tm package in R to perform some text analysis. I tied the following:

require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <         


        
14条回答
  •  迷失自我
    2020-11-29 02:07

    I had the same problem in my mac, solved via below solution.

    raw_data <- read.csv(file.choose(), stringsAsFactors = F,  encoding="UTF-8")
    
    raw_data$textCol<- iconv(raw_data$textCol, "ASCII", "UTF-8", sub="byte")
    
    data_corpus <- VCorpus(VectorSource(raw_data$textCol))
    
    corpus_clean = tm_map(data_corpus, function(x) iconv(x, to='UTF-8-MAC', sub='byte'))
    
    corpus_clean <- tm_map(data_corpus, content_transformer(tolower))
    

提交回复
热议问题