topicmodels

How do i measure perplexity scores on a LDA model made with the textmineR package in R?

ぃ、小莉子 提交于 2020-07-09 05:53:10
问题 I've made a LDA topic model in R, using the textmineR package, it looks as follows. ## get textmineR dtm dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents ngram_window = c(1, 2), doc_names = dat2$names, stopword_vec = c(stopwords::stopwords("da"), custom_stopwords), lower = T, # lowercase - this is the default value remove_punctuation = T, # punctuation - this is the default remove_numbers = T, # numbers - this is the default verbose = T, cpus = 4) dtm2 <- dtm2[,

tidy from broom not finding method for LDA from topicmodels

拜拜、爱过 提交于 2020-01-16 15:26:51
问题 Running this script, straight from 'Text mining with R', library(topicmodels) library(broom) data("AssociatedPress") ap_lda <- LDA(AssociatedPress, k = 2, control = list(seed = 1234)) tidy(ap_lda) I get this error message: Error in as.data.frame.default(x) : cannot coerce class "structure("LDA_VEM", package = "topicmodels")" to a >data.frame In addition: Warning message: In tidy.default(ap_lda) : No method for tidying an S3 object of class LDA_VEM , using as.data.frame packageVersion("broom")

tidy from broom not finding method for LDA from topicmodels

走远了吗. 提交于 2020-01-16 15:26:11
问题 Running this script, straight from 'Text mining with R', library(topicmodels) library(broom) data("AssociatedPress") ap_lda <- LDA(AssociatedPress, k = 2, control = list(seed = 1234)) tidy(ap_lda) I get this error message: Error in as.data.frame.default(x) : cannot coerce class "structure("LDA_VEM", package = "topicmodels")" to a >data.frame In addition: Warning message: In tidy.default(ap_lda) : No method for tidying an S3 object of class LDA_VEM , using as.data.frame packageVersion("broom")

Remove empty documents from DocumentTermMatrix in R topicmodels?

廉价感情. 提交于 2019-12-17 08:24:23
问题 I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix: corpus <- Corpus(VectorSource(vec), readerControl=list(language="en")) corpus <- tm_map(corpus, tolower) corpus <- tm_map(corpus, removePunctuation) corpus <- tm_map(corpus, removeWords, stopwords("english")) corpus <- tm_map(corpus, stripWhitespace) corpus <- tm_map(corpus, removeNumbers) ...snip removing several custom lists of

Different results of LDA using R(topicmodels)

╄→гoц情女王★ 提交于 2019-12-12 02:52:19
问题 I am using R topicmodels to train an LDA model from a small corpus, but I find that every time I repeat the same code, it has the different results (different topics and different topic terms) My question is why the same condition and same corpus has the different result every time, and what should I do to stabilize the result? Here is my code: library(tm) library(topicmodels) cname<-file.path(".","corpus","train") docs<-Corpus(DirSource(cname)) toSpace<-content_transformer(function(x,pattern

DocumentTermMatrix() return 0 terms in tm package

爱⌒轻易说出口 提交于 2019-12-11 14:51:15
问题 I have an object like that: str(apps) chr [1:17517] "35 44 33 40 33 40 44 38 33 37 37" ... In each row, the number is separated by space. corpus<-Corpus(VectorSource(apps)) dtm<-DocumentTermMatrix(corpus) str(dtm) List of 6 $ i : int(0) $ j : int(0) $ v : num(0) $ nrow : int 17517 $ ncol : int 0 $ dimnames:List of 2 ..$ Docs : chr [1:17517] "1" "2" "3" "4" ... ..$ Terms: NULL - attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix" - attr(*, "weighting")= chr [1:2] "term

how to get a probability distribution for a topic in mallet?

自古美人都是妖i 提交于 2019-12-02 19:31:16
问题 Using mallet I can get a specific number of topics and their words. How can I make sure topic words make a probability distribution (ie sum to one)? For example if I run it as bellow, how can I use the outputs given by mallet to make sure probabilities of topic words for topic 0 adds up to 1? mallet train-topics --input text.vectors --output-topic-keys topics.txt --output-doc-topics doc_comp.txt --topic-word-weights-file weights.txt --num-top-words 50 --word-topic-counts-file counts.txt --num

how to get a probability distribution for a topic in mallet?

徘徊边缘 提交于 2019-12-02 10:26:24
Using mallet I can get a specific number of topics and their words. How can I make sure topic words make a probability distribution (ie sum to one)? For example if I run it as bellow, how can I use the outputs given by mallet to make sure probabilities of topic words for topic 0 adds up to 1? mallet train-topics --input text.vectors --output-topic-keys topics.txt --output-doc-topics doc_comp.txt --topic-word-weights-file weights.txt --num-top-words 50 --word-topic-counts-file counts.txt --num-topics 3 --output-state topicstate.gz --alpha 1 来源: https://stackoverflow.com/questions/33251703/how

Remove empty documents from DocumentTermMatrix in R topicmodels?

我怕爱的太早我们不能终老 提交于 2019-11-27 06:37:34
I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix: corpus <- Corpus(VectorSource(vec), readerControl=list(language="en")) corpus <- tm_map(corpus, tolower) corpus <- tm_map(corpus, removePunctuation) corpus <- tm_map(corpus, removeWords, stopwords("english")) corpus <- tm_map(corpus, stripWhitespace) corpus <- tm_map(corpus, removeNumbers) ...snip removing several custom lists of stopwords... corpus <- tm_map(corpus, stemDocument) dtm <- DocumentTermMatrix(corpus, control=list