Twitter Data Analysis - Error in Term Document Matrix

前端 未结 6 778
滥情空心
滥情空心 2020-12-03 18:30

Trying to do some analysis of twitter data. Downloaded the tweets and created a corpus from the text of the tweets using the below

# Creating a Corpus
wim_co         


        
6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-03 19:22

    I think this problem happens because of some weird characters appear in the text. Here is my solution:

    wim_corpus = tm_map(wim_corpus, str_replace_all,"[^[:alnum:]]", " ")
    
    
    tdm = TermDocumentMatrix(wim_corpus, 
                           control = list(removePunctuation = TRUE, 
                                          stopwords =  TRUE, 
                                          removeNumbers = TRUE, tolower = TRUE))
    

提交回复
热议问题