问题
I was followed the instruction from here
In slide no. 9 tolower has issue in package tm 0.6 and above I have used
myCorpus <- tm_map(myCorpus, content_transformer(tolower)
it was duplicate from this stackoverflow but i still get error when run stemCompletion
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
And I follow this instruction for both variable myCorpus and myCorpusCopy to PlainTextDocument
corpus <- tm_map(corpus, PlainTextDocument)
I was able to execute
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
but I get 50 warnings
There were 50 or more warnings (use warnings() to see the first 50) warnings()
and I get all 50 warnings:
1: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 2: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 3: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 4: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 5: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 6: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 7: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 8: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 9: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 10: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used
I try to ignore the warnings and create TermDocumentMatrix()
tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1,
Inf)))
and I get error:
Error: inherits(doc, "TextDocument") is not TRUE
回答1:
Here's how you can create a stemmed term-document-matrix and re-complete the stemmed tokens afterwards:
txt <- " was followed the instruction from here In slide no. 9 tolower has issue in package tm 0.6 and above I have used "
myCorpus <- Corpus(VectorSource(txt))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
tdm <- TermDocumentMatrix(myCorpus, control = list(stemming = TRUE))
cbind(stems = rownames(tdm), completed = stemCompletion(rownames(tdm), myCorpus))
# stems completed
# 0.6 "0.6" "0.6"
# abov "abov" "above"
# and "and" "and"
# follow "follow" "followed"
# from "from" "from"
# has "has" "has"
# have "have" "have"
# here "here" "here"
# instruct "instruct" "instruction"
# issu "issu" "issue"
# no. "no." "no."
# packag "packag" "package"
# slide "slide" "slide"
# the "the" "the"
# tolow "tolow" "tolower"
# use "use" "used"
# was "was" "was"
来源:https://stackoverflow.com/questions/30321770/r-warning-in-stemcompletion-and-error-in-termdocumentmatrix