weka

How to read a text file with mixed encodings in Scala or Java?

故事扮演 提交于 2019-11-27 05:10:12
问题 I am trying to parse a CSV file, ideally using weka.core.converters.CSVLoader. However the file I have is not a valid UTF-8 file. It is mostly a UTF-8 file but some of the field values are in different encodings, so there is no encoding in which the whole file is valid, but I need to parse it anyway. Apart from using java libraries like Weka, I am mainly working in Scala. I am not even able to read the file usin scala.io.Source: For example Source. fromFile(filename)("UTF-8"). foreach(print);

Trying to add database driver (JDBC): RmiJdbc.RJDriver - Error, not in CLASSPATH?

情到浓时终转凉″ 提交于 2019-11-27 03:31:10
问题 I am using Weka import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; . . DataSource source; source = new DataSource("somecsvfile.csv"); I get following printed on console in red color in eclipse: ---Registering Weka Editors--- Trying to add database driver (JDBC): RmiJdbc.RJDriver - Error, not in CLASSPATH? Trying to add database driver (JDBC): jdbc.idbDriver - Error, not in CLASSPATH? Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Error, not in

Weka: Results of each fold in 10-fold CV

此生再无相见时 提交于 2019-11-27 03:25:56
问题 For Weka Explorer (GUI), when we do a 10-fold CV for any given ARFF file, then what Weka Explorer provides (as far as I can see) is the average result for all the 10 folds. Q. Is there any way to get the results of each fold? For instance, I need the error rates (incorrectly identified instances) for each fold. Help appreciated. 回答1: I think this is possible using Weka's GUI. You need to use the Experimenter though instead of the Explorer. Here are the steps: Open the Experimenter from the

Adding a new Instance in weka

做~自己de王妃 提交于 2019-11-27 00:49:00
问题 How can I add a new Instance to an existing Instances object that I created ? Here is an example: ArrayList<Attribute> atts = new ArrayList<Attribute>(2); ArrayList<String> classVal = new ArrayList<String>(); classVal.add("A"); classVal.add("B"); atts.add(new Attribute("content",(ArrayList<String>)null)); atts.add(new Attribute("@@class@@",classVal)); Instances dataRaw = new Instances("TestInstances",atts,0); I want to add a new instance to dataRaw. As far as I know I have to use dataRaw.add

How to get predication value for an instance in weka?

寵の児 提交于 2019-11-26 20:32:53
问题 I am working on Weka and need to output the predication values (probabilities) of each labels for each test instance. In GUI there is an option in classify tab as (classify -> options -> Output predicted value) which does this work by outputting the prediction probabilities for each label but how to do this in java code. I want to receive probability scores for each label after classifying it ? 回答1: The following code takes in a set of training instances, and outputs the predicted probability

Weka 入门1

三世轮回 提交于 2019-11-26 11:37:54
本人也是借鉴网上他人资料。主要介绍使用java调用Weka库。 首先介绍weka,Weka的全名是怀卡托 智能分析 环境,是基于开源环境的机器学习和数据挖掘软件。我们可以去 weka官网 下载最新的Weka软件,目前最新版本是3.7.9。默认安装会保存 在C:\Program Files\Weka-3-7目录下,目录下 有一个 data 的文件夹,里面存放的是一些数据集,我们也可以把data文件拷到别的地方更方便调用,这里面的数据可以用于我们学习Weka的使用。我们以data文件夹中的一个.arff文 件为例对文件格式进行说明,如下图: (1)关系声明 格式为@relation <relation-name>在文件的第一行,关系名称不能有空格,如有空格需要用加上引号。 (2)属性说明 格式为@attribute <attribute-name> <data-type> attribute-name是属性名称,区分大小写。data-type是数据类型,常用类型有numeric(数值型:整数,小数等)nominal(分类型:举个例子如@attribute outlook{sunny,overcast,rainy},取值集合就是后面sunny,overcast,rainy) (3)数据说明 数据信息以@data作为标志。 在@data下面的行中,每一行作为一个例子

Weka 入门2

醉酒当歌 提交于 2019-11-26 11:37:38
现在我们介绍使用Weka来对数据进行分类。对数据进行分类,我们必须先指定那一列作为预测类别。因为数据文件格式的问题,类别一般都是最后一列属性。我们可以使用setClassIndex来设置类别。然后我们要选择分类器,分类器有很多,我们暂时使用J48分类器。对数据进行训练可以使用buildClassifier,然后我们可以用classifyInstance来查看训练数据预测的类别值。当然预测的类别会用数值表示,比如0,1,2....代表预测的值属于第几个类别。例如类别的值为{sunny,rainy}那么0代表sunny,1代表rainy。 package InstanceTest; import weka.core.Instances; import weka.classifiers.trees.J48; import weka.classifiers.trees.j48.*; import java.io.*; public class InstanceTest { /** * @param args */ public Instances data; //设置预测类别 默认为最后一个 public void SetClassIndex(Instances ins) { ins.setClassIndex(ins.numAttributes()-1); } public

Weka 入门3

穿精又带淫゛_ 提交于 2019-11-26 11:37:36
这次我们介绍 Evaluation类。在上一次中我们只是单纯的预测了分类值,并没有其他评价数据。这场我们使用Evalution类。首先初始化一个 Evaluation 对象, Evaluation 类没有无参的构造函数,一般用 Instances 对象作为构造函数的参数。如果我们没有训练数据和测试数据,那么我们可以使用Cross Validation验证方式,即交叉验证。C ross ValidateModel 方法的四个参数分别为,第一个是分类器,第二个是在某个数据集上评价的数据集,第三个参数是交叉检验的次数( 10 是比较常见的),第四个是一个随机数对象。 如果有训练集和测试集,可以使用 Evaluation 类中的 evaluateModel 方法,方法中的参数为:第一个为一个训练过的分类器,第二个参数是在某个数据集上评价的数据集。 package InstanceTest; import weka.core.Instances; import weka.classifiers.trees.J48; import weka.classifiers.Evaluation; import java.io.*; import java.util.Random; public class InstanceTest { /** * @param args */ public

Weka 自动优化参数

我的未来我决定 提交于 2019-11-26 11:37:32
import weka.core.*; import weka.classifiers.*; import weka.classifiers.meta.*; import weka.classifiers.trees.*; import java.io.*; /** * A little example for optimizing J48's confidence parameter with * CVPArameterSelection meta-classifier. * The class expects a dataset as first parameter, class attribute is * assumed to be the last attribute. * * @author FracPete (fracpete at waikato dot ac dot nz) */ public class CVParam { public static void main(String[] args) throws Exception { // load data BufferedReader reader = new BufferedReader(new FileReader(args[0])); Instances data = new Instances

weka

旧城冷巷雨未停 提交于 2019-11-26 11:37:31
// remove instances with missing class Instances newData = new Instances(data); newData.deleteWithMissingClass(); m_structure = new Instances(newData, 0); m_Random = new Random(getSeed()); if (m_classifiersToLoad.size() > 0) { m_preBuiltClassifiers.clear(); loadClassifiers(data); if (m_Classifiers.length == 1 && m_Classifiers[0] instanceof weka.classifiers.rules.ZeroR) { // remove the single ZeroR m_Classifiers = new Classifier[0]; } } Classifier [] Classifiers = new Classifier[3]; //m_Classifiers[0]=new LibSVM(); Classifiers[0]=new J48(); Classifiers[1]=new Logistic(); Classifiers[2]=new SMO(