Remove empty documents from DocumentTermMatrix in R topicmodels?

后端 未结 6 2055
鱼传尺愫
鱼传尺愫 2020-11-30 22:44

I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix:

<
6条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-11-30 23:25

    This is just to elaborate on the answer given by agstudy.

    Instead of removing the empty rows from the dtm matrix, we can identify the documents in our corpus that have zero length and remove the documents directly from the corpus, before performing a second dtm with only non empty documents.

    This is useful to keep a 1:1 correspondence between the dtm and the corpus.

    empty.rows <- dtm[rowTotals == 0, ]$dimnames[1][[1]] corpus <- corpus[-as.numeric(empty.rows)]

提交回复
热议问题