Remove empty documents from DocumentTermMatrix in R topicmodels?

后端 未结 6 2044
鱼传尺愫
鱼传尺愫 2020-11-30 22:44

I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix:

<
6条回答
  •  独厮守ぢ
    2020-11-30 23:30

    "Each row of the input matrix needs to contain at least one non-zero entry"
    

    The error means that sparse matrix contain a row without entries(words). one Idea is to compute the sum of words by row

    rowTotals <- apply(dtm , 1, sum) #Find the sum of words in each Document
    dtm.new   <- dtm[rowTotals> 0, ]           #remove all docs without words
    

提交回复
热议问题