Trying to do some analysis of twitter data. Downloaded the tweets and created a corpus from the text of the tweets using the below
# Creating a Corpus
wim_co
I have found a way to solve this problem in an article about TM.
An example in which the error follows below:
getwd()
require(tm)
# Importing files
files <- DirSource(directory = "texts/",encoding ="latin1" )
# loading files and creating a Corpus
corpus <- VCorpus(x=files)
# Summary
summary(corpus)
corpus <- tm_map(corpus,removePunctuation)
corpus <- tm_map(corpus,stripWhitespace)
corpus <- tm_map(corpus,removePunctuation)
matrix_terms <- DocumentTermMatrix(corpus)
Warning messages: In TermDocumentMatrix.VCorpus(x, control) : invalid document identifiers
This error occurs because you need an object of the class Vector Source to do your Term Document Matrix, but the previous transformations transform your corpus of texts in character, therefore, changing a class which is not accepted by the function.
However, if you add one more command before using the function TermDocumentMatrix you can keep going.
Below follows the code with the new command:
getwd()
require(tm)
files <- DirSource(directory = "texts/",encoding ="latin1" )
# loading files and creating a Corpus
corpus <- VCorpus(x=files)
# Summary
summary(corpus)
corpus <- tm_map(corpus,removePunctuation)
corpus <- tm_map(corpus,stripWhitespace)
corpus <- tm_map(corpus,removePunctuation)
# COMMAND TO CHANGE THE CLASS AND AVOID THIS ERROR
corpus <- Corpus(VectorSource(corpus))
matriz_terms <- DocumentTermMatrix(corpus)
Therefore, you won't have more problems with this.