Twitter Data Analysis - Error in Term Document Matrix

前端 未结 6 780
滥情空心
滥情空心 2020-12-03 18:30

Trying to do some analysis of twitter data. Downloaded the tweets and created a corpus from the text of the tweets using the below

# Creating a Corpus
wim_co         


        
6条回答
  •  暖寄归人
    2020-12-03 19:12

    As Albert suggested, converting the text encoding to "utf-8" solved the problem for me. But instead of removing the whole tweet with problematic characters, you can use the sub option in iconv to only remove the "bad" characters in a tweet and keep the rest:

    tweets <- iconv(rawTweets, to = "utf-8", sub="")
    

    This does not produce NAs anymore and no further filtration step is necessary.

提交回复
热议问题