R: TermDocumentMatrix - Error while creating

假如想象 提交于 2019-12-14 02:56:22

问题


I am trying to get twitter data and create a wordcloud but my code is giving error while creating TermDocumentMatrix. My code is as below

twitter_search_data <- searchTwitter(searchString = text_to_search
                                    ,n = 500)

twitter_search_text <- sapply(twitter_search_data
                             ,function(x) x$getText())

twitter_search_corpus <- Corpus(VectorSource(twitter_search_text))

twitter_search_corpus <- tm_map(twitter_search_corpus, stripWhitespace, lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, content_transformer(tolower), lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, PlainTextDocument,lazy = TRUE)    

twitter_search_corpus <- tm_map(twitter_search_corpus, removePunctuation, lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, removeNumbers, lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, removeWords, c("the", "this", "The", "This", stopwords('english')), lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, stemDocument, lazy = TRUE)

# Create Document Term Matrix 
tdm <- as.matrix(TermDocumentMatrix(twitter_search_corpus
                                   ,control=list(wordLengths=c(3,Inf))
                                   ))

There are no errors before creating TermDocumentMatrix. The error I get is as below

Warning in mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) : scheduled core 1 encountered error in user code, all values of the job will be affected Warning in mclapply(unname(content(x)), termFreq, control) : scheduled core 1 encountered error in user code, all values of the job will be affected Warning: Error in UseMethod: no applicable method for 'meta' applied to an object of class "try-error" Stack trace (innermost first): 74: FUN
73: lapply
72: setNames
71: as.list.VCorpus
70: as.list
69: lapply
68: meta.VCorpus
67: meta
66: TermDocumentMatrix.VCorpus
65: TermDocumentMatrix
64: as.matrix
63: observeEventHandler
1: runApp

I have already added lazy = TRUE and content_transformer(tolower) but still the error is coming.


回答1:


The issue seems to be with placement of

twitter_search_corpus <- tm_map(twitter_search_corpus, stripWhitespace, lazy = TRUE)

After removing punctuation, numbers and words whitespaces were inserted in the text. So the above code to remove whitespaces need to be the last statement before creating TermDocumentMatrix.



来源:https://stackoverflow.com/questions/37088965/r-termdocumentmatrix-error-while-creating

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!