mallet

Why Mallet text classification output the same value 1.0 for all test files?

旧巷老猫 提交于 2019-12-13 03:38:23
问题 I am learning Mallet text classification command lines. The output values for estimating differrent classes are all the same 1.0. I do not know where I am incorrect. Can you help? mallet version: E:\Mallet\mallet-2.0.8RC3 //there is a txt file about cat breed (catmaterial.txt) in cat dir. //command 1 C:\Users\toshiba>mallet import-dir --input E:\Mallet\testmaterial\cat --output E :\Mallet\testmaterial\cat.mallet --remove-stopwords //command 1 output Labels = E:\Mallet\testmaterial\cat /

Mallet SimpleTagger Classpath

那年仲夏 提交于 2019-12-12 01:57:10
问题 I am going to use Mallet SimpleTagger for sequence tagging. However, I have problem with setting the classpath. As I have seen here: classpath I must be able to use java -cp to set the classpath. I followed the instructions here (I am sure that I have installed Ant and Mallet correctly). However, I receive this message: Error: could not find or load main class cc.mallet.fst.SimpleTagger Here is the real code that I use: C:\mallet> java -cp "C:\mallet\class:C:\mallet\lib\mallet-deps.jar" cc

Trick to use file paths with spaces in Mallet (Terminal, OSx)?

人走茶凉 提交于 2019-12-11 05:47:24
问题 Is there a trick to be able to use file paths with spaces in Mallet through the terminal on mac? For example, all of the following give me errors: escaping the space ./bin/mallet import-dir --input /Volumes/Macintosh\ HD/Users/MY_NAME/Desktop/en --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE double quotes, no escapes ./bin/mallet import-dir --input "/Volumes/Macintosh HD/Users/MY_NAME/Desktop/en" --output /Users/MY_NAME/Desktop/en.mallet --remove

What are the memory limitation for Mallet training files?

徘徊边缘 提交于 2019-12-08 03:33:14
问题 Mallet converts training cases to binary format using the command import-file, e.g. bin/mallet import-file --input cases.txt --output cases.mallet How is this binary ".mallet" file then used? Is it streamed or is the whole file loaded into memory. If it is all loaded then this places a limit on the number of training cases based on available memory. Is it possible to characterize the size of the .mallet file based on the the size of input cases file or number of input cases? 来源: https:/

How do I load and use a CRF trained with Mallet?

时光总嘲笑我的痴心妄想 提交于 2019-12-07 03:29:01
问题 I've trained a CRF using GenericAcrfTui , it writes an ACRF to a file. I'm not quite sure how to load and use the trained CRF but import cc.mallet.grmm.learning.ACRF; import cc.mallet.util.FileUtils; ACRF c = (ACRF) FileUtils.readObject(Paths.get("acrf.ser.gz").toFile()); seems to work. However, the labeling seems incorrect and seems to rely on the labels that I pass as input. How do I label using a loaded ACRF? Here's how I do my labeling: GenericAcrfData2TokenSequence instanceMaker = new

How to evaluate the best K for LDA using Mallet?

给你一囗甜甜゛ 提交于 2019-12-06 09:55:04
问题 I am using Mallet api to extract topic from twitter data and I have already extracted topics which are seems good topic. But I am facing problem to estimating K. For example I fixed K value from 10 to 100. So, I have taken different number of topics from the data. But, now I would like to estimate which K is best. There are some algorithm I know as Perplexity Empirical likelihood Marginal likelihood (Harmonic mean method) Silhouette I found a method model.estimate() which may be used to

Mallet topic modelling

旧城冷巷雨未停 提交于 2019-12-05 08:17:50
I have been using mallet for inferring topics for a text file containing 100,000 lines(around 34 MB in mallet format). But now i need to run it for on a file containing a million lines(around 180MB) and I am getting an java.lang.outofmemory exception . Is there a way of splitting the file into smaller ones and build a model for the data present in all the files combined?? thanks in advance In bin/mallet.bat increase value for this line: set MALLET_MEMORY=1G I'm not sure about scalability of Mallet to big data, but project http://dragon.ischool.drexel.edu/ can store its data in disk backed

How to use Mallet for NER [closed]

强颜欢笑 提交于 2019-12-04 17:12:14
I'm new to the subject of NLP and requested to perform -named entity recognition- (NER) using Mallet. I have a text, and I give feature vector for each word in it. I would like to train a model which later on I can test on fresh text file. My question is how do I create such model, what is the input for the model. I could use some code examples :) Thanks ! nflacco Fei Xia at UW wrote a pretty good MALLET guide . You can find an example of programmatic (Java) interaction with MALLET at the bottom of this page . The MALLET quick start on sequence tagging , right on the MALLET home page,

Folding in (estimating topics for new documents) in LDA using Mallet in Java

僤鯓⒐⒋嵵緔 提交于 2019-12-04 16:20:21
I'm using Mallet through Java, and I can't work out how to evaluate new documents against an existing topic model which I have trained. My initial code to generate my model is very similar to that in the Mallett Developers Guide for Topic Modelling , after which I simply save the model as a Java object. In a later process, I reload that Java object from file, add new instances via .addInstances() and would then like to evaluate only these new instances against the topics found in the original training set. This stats.SE thread provides some high-level suggestions, but I can't see how to work

How to evaluate the best K for LDA using Mallet?

烈酒焚心 提交于 2019-12-04 16:13:50
I am using Mallet api to extract topic from twitter data and I have already extracted topics which are seems good topic. But I am facing problem to estimating K. For example I fixed K value from 10 to 100. So, I have taken different number of topics from the data. But, now I would like to estimate which K is best. There are some algorithm I know as Perplexity Empirical likelihood Marginal likelihood (Harmonic mean method) Silhouette I found a method model.estimate() which may be used to estimate with different value of K. But I am not getting any idea to show the value of K is best for the