mallet

How to create a table by restructuring a MALLET output file?

谁说我不能喝 提交于 2019-12-04 09:47:35
I'm using MALLET for topic analysis which is outputting results in text files ("topics.txt") of several thousand rows and a hundred or so rows where each row consists of tab-separated variables like this: Num1 text1 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Num2 text2 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Num3 text3 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Here's a snippet of the actual data: > dat[1:5,1:10] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 0 10.txt 27 0.4560785 23 0.3040853 20 0.1315621 21 0.03632624 2 1 1001.txt 20 0.2660085

how to get a probability distribution for a topic in mallet?

自古美人都是妖i 提交于 2019-12-02 19:31:16
问题 Using mallet I can get a specific number of topics and their words. How can I make sure topic words make a probability distribution (ie sum to one)? For example if I run it as bellow, how can I use the outputs given by mallet to make sure probabilities of topic words for topic 0 adds up to 1? mallet train-topics --input text.vectors --output-topic-keys topics.txt --output-doc-topics doc_comp.txt --topic-word-weights-file weights.txt --num-top-words 50 --word-topic-counts-file counts.txt --num

how to get a probability distribution for a topic in mallet?

徘徊边缘 提交于 2019-12-02 10:26:24
Using mallet I can get a specific number of topics and their words. How can I make sure topic words make a probability distribution (ie sum to one)? For example if I run it as bellow, how can I use the outputs given by mallet to make sure probabilities of topic words for topic 0 adds up to 1? mallet train-topics --input text.vectors --output-topic-keys topics.txt --output-doc-topics doc_comp.txt --topic-word-weights-file weights.txt --num-top-words 50 --word-topic-counts-file counts.txt --num-topics 3 --output-state topicstate.gz --alpha 1 来源: https://stackoverflow.com/questions/33251703/how

How to get topic vector of new documents and compare with pre-defined topic model in Mallet?

亡梦爱人 提交于 2019-12-02 00:31:14
问题 I'm trying to somehow compare a sole document's topic distribution (using LDA) with, other files and their topic distributions within a previously created topic model, using MALLET. I know that this can be done through MALLET commands in terminal but I'm having problems in finding a way to implement this in Java. To give a gist of what the functionality of my program is: The already created topic model was created with a large corpus of texts. I want to use this to compare topic distributions

How to get topic vector of new documents and compare with pre-defined topic model in Mallet?

泄露秘密 提交于 2019-12-01 21:48:20
I'm trying to somehow compare a sole document's topic distribution (using LDA) with, other files and their topic distributions within a previously created topic model, using MALLET. I know that this can be done through MALLET commands in terminal but I'm having problems in finding a way to implement this in Java. To give a gist of what the functionality of my program is: The already created topic model was created with a large corpus of texts. I want to use this to compare topic distributions with a tweet that contains a certain hashtag and to then pull out the file most similar to the tweet

how to get word-topic probability using mallet

房东的猫 提交于 2019-12-01 05:15:06
问题 I've made a parallel topic model using mallet. And I want to get top-words for each document. To do that, I'm trying to get a word-topic probability matrix. How would I achieve this? 回答1: When you are building topics using MALLET, you have an option called --word-topic-counts-file . When you give this option and specify a file, MALLET writes ( topic, word, probability ) values per each line in the file. You can later read this file in C, Java or R (of course, any language) to create the