mallet | 易学教程

How to create a table by restructuring a MALLET output file?

阅读更多关于 How to create a table by restructuring a MALLET output file?

I'm using MALLET for topic analysis which is outputting results in text files ("topics.txt") of several thousand rows and a hundred or so rows where each row consists of tab-separated variables like this: Num1 text1 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Num2 text2 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Num3 text3 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Here's a snippet of the actual data: > dat[1:5,1:10] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 0 10.txt 27 0.4560785 23 0.3040853 20 0.1315621 21 0.03632624 2 1 1001.txt 20 0.2660085

how to get a probability distribution for a topic in mallet?

阅读更多关于 how to get a probability distribution for a topic in mallet?

问题 Using mallet I can get a specific number of topics and their words. How can I make sure topic words make a probability distribution (ie sum to one)? For example if I run it as bellow, how can I use the outputs given by mallet to make sure probabilities of topic words for topic 0 adds up to 1? mallet train-topics --input text.vectors --output-topic-keys topics.txt --output-doc-topics doc_comp.txt --topic-word-weights-file weights.txt --num-top-words 50 --word-topic-counts-file counts.txt --num

how to get a probability distribution for a topic in mallet?

阅读更多关于 how to get a probability distribution for a topic in mallet?

Using mallet I can get a specific number of topics and their words. How can I make sure topic words make a probability distribution (ie sum to one)? For example if I run it as bellow, how can I use the outputs given by mallet to make sure probabilities of topic words for topic 0 adds up to 1? mallet train-topics --input text.vectors --output-topic-keys topics.txt --output-doc-topics doc_comp.txt --topic-word-weights-file weights.txt --num-top-words 50 --word-topic-counts-file counts.txt --num-topics 3 --output-state topicstate.gz --alpha 1 来源： https://stackoverflow.com/questions/33251703/how

How to get topic vector of new documents and compare with pre-defined topic model in Mallet?

阅读更多关于 How to get topic vector of new documents and compare with pre-defined topic model in Mallet?

问题 I'm trying to somehow compare a sole document's topic distribution (using LDA) with, other files and their topic distributions within a previously created topic model, using MALLET. I know that this can be done through MALLET commands in terminal but I'm having problems in finding a way to implement this in Java. To give a gist of what the functionality of my program is: The already created topic model was created with a large corpus of texts. I want to use this to compare topic distributions

How to get topic vector of new documents and compare with pre-defined topic model in Mallet?

阅读更多关于 How to get topic vector of new documents and compare with pre-defined topic model in Mallet?

I'm trying to somehow compare a sole document's topic distribution (using LDA) with, other files and their topic distributions within a previously created topic model, using MALLET. I know that this can be done through MALLET commands in terminal but I'm having problems in finding a way to implement this in Java. To give a gist of what the functionality of my program is: The already created topic model was created with a large corpus of texts. I want to use this to compare topic distributions with a tweet that contains a certain hashtag and to then pull out the file most similar to the tweet

how to get word-topic probability using mallet

阅读更多关于 how to get word-topic probability using mallet

问题 I've made a parallel topic model using mallet. And I want to get top-words for each document. To do that, I'm trying to get a word-topic probability matrix. How would I achieve this? 回答1: When you are building topics using MALLET, you have an option called --word-topic-counts-file . When you give this option and specify a file, MALLET writes ( topic, word, probability ) values per each line in the file. You can later read this file in C, Java or R (of course, any language) to create the