How do i measure perplexity scores on a LDA model made with the textmineR package in R?

ぃ、小莉子 提交于 2020-07-09 05:53:10

问题


I've made a LDA topic model in R, using the textmineR package, it looks as follows.

## get textmineR dtm
dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents
                 ngram_window = c(1, 2), 
                 doc_names = dat2$names,
                 stopword_vec = c(stopwords::stopwords("da"), custom_stopwords),
                 lower = T, # lowercase - this is the default value
                 remove_punctuation = T, # punctuation - this is the default
                 remove_numbers = T, # numbers - this is the default
                 verbose = T,
                 cpus = 4)



dtm2 <- dtm2[, colSums(dtm2) > 2]
dtm2 <- dtm2[, str_length(colnames(dtm2)) > 2]


############################################################
## RUN & EXAMINE TOPIC MODEL
############################################################

# Draw quasi-random sample from the pc
set.seed(34838)

model2 <- FitLdaModel(dtm = dtm2, 
                     k = 8,
                     iterations = 500,
                     burnin = 200,
                     alpha = 0.1,
                     beta = 0.05,
                     optimize_alpha = TRUE,
                     calc_likelihood = TRUE,
                     calc_coherence = TRUE,
                     calc_r2 = TRUE,
                     cpus = 4) 

The questions are then: 1. Which function should i apply to get the perplexity scores in the textmineR package? I can't seem to find one.
2. how do i measure complexity scores for different numbers of topics(k)?


回答1:


As asked: there's no way to calculate perplexity with textmineR unless you explicitly program it yourself. TBH, I've never seen value of perplexity that you couldn't get with likelihood and coherence, so I didn't implement it.

However, the text2vec package does have an implementation. See below for example:

library(textmineR)

# model ships with textmineR as example
m <- nih_sample_topic_model

# dtm ships with textmineR as example
d <- nih_sample_dtm

# get perplexity
p <- text2vec::perplexity(X = d, 
                          topic_word_distribution = m$phi, 
                          doc_topic_distribution = m$theta)




来源:https://stackoverflow.com/questions/59411177/how-do-i-measure-perplexity-scores-on-a-lda-model-made-with-the-textminer-packag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!