R tm package invalid input in 'utf8towcs'

前端 未结 14 1394
逝去的感伤
逝去的感伤 2020-11-29 01:47

I\'m trying to use the tm package in R to perform some text analysis. I tied the following:

require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <         


        
14条回答
  •  温柔的废话
    2020-11-29 02:23

    If it's alright to ignore invalid inputs, you could use R's error handling. e.g:

      dataSet <- Corpus(DirSource('tmp/'))
      dataSet <- tm_map(dataSet, function(data) {
         #ERROR HANDLING
         possibleError <- tryCatch(
             tolower(data),
             error=function(e) e
         )
    
         # if(!inherits(possibleError, "error")){
         #   REAL WORK. Could do more work on your data here,
         #   because you know the input is valid.
         #   useful(data); fun(data); good(data);
         # }
      }) 
    

    There is an additional example here: http://gastonsanchez.wordpress.com/2012/05/29/catching-errors-when-using-tolower/

提交回复
热议问题