R Warning in stemCompletion and error in TermDocumentMatrix

有些话、适合烂在心里 提交于 2019-12-11 11:19:18

问题


I was followed the instruction from here

In slide no. 9 tolower has issue in package tm 0.6 and above I have used

myCorpus <- tm_map(myCorpus, content_transformer(tolower)

it was duplicate from this stackoverflow but i still get error when run stemCompletion

myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)

And I follow this instruction for both variable myCorpus and myCorpusCopy to PlainTextDocument

corpus <- tm_map(corpus, PlainTextDocument)

I was able to execute

myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)

but I get 50 warnings

There were 50 or more warnings (use warnings() to see the first 50) warnings()

and I get all 50 warnings:

1: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 2: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 3: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 4: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 5: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 6: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 7: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 8: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 9: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 10: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used

I try to ignore the warnings and create TermDocumentMatrix()

tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1,   
Inf)))

and I get error:

Error: inherits(doc, "TextDocument") is not TRUE

回答1:


Here's how you can create a stemmed term-document-matrix and re-complete the stemmed tokens afterwards:

txt <- " was followed the instruction from here In slide no. 9 tolower has issue in package tm 0.6 and above I have used "
myCorpus <- Corpus(VectorSource(txt))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
tdm <- TermDocumentMatrix(myCorpus, control = list(stemming = TRUE)) 
cbind(stems = rownames(tdm), completed = stemCompletion(rownames(tdm), myCorpus))  
#          stems      completed    
# 0.6      "0.6"      "0.6"        
# abov     "abov"     "above"      
# and      "and"      "and"        
# follow   "follow"   "followed"   
# from     "from"     "from"       
# has      "has"      "has"        
# have     "have"     "have"       
# here     "here"     "here"       
# instruct "instruct" "instruction"
# issu     "issu"     "issue"      
# no.      "no."      "no."        
# packag   "packag"   "package"    
# slide    "slide"    "slide"      
# the      "the"      "the"        
# tolow    "tolow"    "tolower"    
# use      "use"      "used"       
# was      "was"      "was"    


来源:https://stackoverflow.com/questions/30321770/r-warning-in-stemcompletion-and-error-in-termdocumentmatrix

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!