weka

How to read a text file with mixed encodings in Scala or Java?

阅读更多关于 How to read a text file with mixed encodings in Scala or Java?

问题 I am trying to parse a CSV file, ideally using weka.core.converters.CSVLoader. However the file I have is not a valid UTF-8 file. It is mostly a UTF-8 file but some of the field values are in different encodings, so there is no encoding in which the whole file is valid, but I need to parse it anyway. Apart from using java libraries like Weka, I am mainly working in Scala. I am not even able to read the file usin scala.io.Source: For example Source. fromFile(filename)("UTF-8"). foreach(print);

Trying to add database driver (JDBC): RmiJdbc.RJDriver - Error, not in CLASSPATH?

阅读更多关于 Trying to add database driver (JDBC): RmiJdbc.RJDriver - Error, not in CLASSPATH?

问题 I am using Weka import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; . . DataSource source; source = new DataSource("somecsvfile.csv"); I get following printed on console in red color in eclipse: ---Registering Weka Editors--- Trying to add database driver (JDBC): RmiJdbc.RJDriver - Error, not in CLASSPATH? Trying to add database driver (JDBC): jdbc.idbDriver - Error, not in CLASSPATH? Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Error, not in

Weka: Results of each fold in 10-fold CV

阅读更多关于 Weka: Results of each fold in 10-fold CV

问题 For Weka Explorer (GUI), when we do a 10-fold CV for any given ARFF file, then what Weka Explorer provides (as far as I can see) is the average result for all the 10 folds. Q. Is there any way to get the results of each fold? For instance, I need the error rates (incorrectly identified instances) for each fold. Help appreciated. 回答1: I think this is possible using Weka's GUI. You need to use the Experimenter though instead of the Explorer. Here are the steps: Open the Experimenter from the

Adding a new Instance in weka

阅读更多关于 Adding a new Instance in weka

问题 How can I add a new Instance to an existing Instances object that I created ? Here is an example: ArrayList<Attribute> atts = new ArrayList<Attribute>(2); ArrayList<String> classVal = new ArrayList<String>(); classVal.add("A"); classVal.add("B"); atts.add(new Attribute("content",(ArrayList<String>)null)); atts.add(new Attribute("@@class@@",classVal)); Instances dataRaw = new Instances("TestInstances",atts,0); I want to add a new instance to dataRaw. As far as I know I have to use dataRaw.add

How to get predication value for an instance in weka?

阅读更多关于 How to get predication value for an instance in weka?

问题 I am working on Weka and need to output the predication values (probabilities) of each labels for each test instance. In GUI there is an option in classify tab as (classify -> options -> Output predicted value) which does this work by outputting the prediction probabilities for each label but how to do this in java code. I want to receive probability scores for each label after classifying it ? 回答1: The following code takes in a set of training instances, and outputs the predicted probability

Weka 入门1

阅读更多关于 Weka 入门1

本人也是借鉴网上他人资料。主要介绍使用java调用Weka库。首先介绍weka，Weka的全名是怀卡托智能分析环境，是基于开源环境的机器学习和数据挖掘软件。我们可以去 weka官网下载最新的Weka软件，目前最新版本是3.7.9。默认安装会保存在C:\Program Files\Weka-3-7目录下，目录下有一个 data 的文件夹，里面存放的是一些数据集，我们也可以把data文件拷到别的地方更方便调用，这里面的数据可以用于我们学习Weka的使用。我们以data文件夹中的一个.arff文件为例对文件格式进行说明，如下图：（1）关系声明格式为@relation <relation-name>在文件的第一行，关系名称不能有空格，如有空格需要用加上引号。（2）属性说明格式为@attribute <attribute-name> <data-type> attribute-name是属性名称，区分大小写。data-type是数据类型，常用类型有numeric（数值型：整数，小数等）nominal（分类型：举个例子如@attribute outlook{sunny，overcast，rainy}，取值集合就是后面sunny，overcast，rainy）（3）数据说明数据信息以@data作为标志。在@data下面的行中，每一行作为一个例子

Weka 入门2

阅读更多关于 Weka 入门2

现在我们介绍使用Weka来对数据进行分类。对数据进行分类，我们必须先指定那一列作为预测类别。因为数据文件格式的问题，类别一般都是最后一列属性。我们可以使用setClassIndex来设置类别。然后我们要选择分类器，分类器有很多，我们暂时使用J48分类器。对数据进行训练可以使用buildClassifier，然后我们可以用classifyInstance来查看训练数据预测的类别值。当然预测的类别会用数值表示，比如0,1,2....代表预测的值属于第几个类别。例如类别的值为{sunny，rainy}那么0代表sunny，1代表rainy。 package InstanceTest; import weka.core.Instances; import weka.classifiers.trees.J48; import weka.classifiers.trees.j48.*; import java.io.*; public class InstanceTest { /** * @param args */ public Instances data; //设置预测类别默认为最后一个 public void SetClassIndex(Instances ins) { ins.setClassIndex(ins.numAttributes()-1); } public

Weka 入门3

阅读更多关于 Weka 入门3

这次我们介绍 Evaluation类。在上一次中我们只是单纯的预测了分类值，并没有其他评价数据。这场我们使用Evalution类。首先初始化一个 Evaluation 对象， Evaluation 类没有无参的构造函数，一般用 Instances 对象作为构造函数的参数。如果我们没有训练数据和测试数据，那么我们可以使用Cross Validation验证方式，即交叉验证。C ross ValidateModel 方法的四个参数分别为，第一个是分类器，第二个是在某个数据集上评价的数据集，第三个参数是交叉检验的次数（ 10 是比较常见的），第四个是一个随机数对象。如果有训练集和测试集，可以使用 Evaluation 类中的 evaluateModel 方法，方法中的参数为：第一个为一个训练过的分类器，第二个参数是在某个数据集上评价的数据集。 package InstanceTest; import weka.core.Instances; import weka.classifiers.trees.J48; import weka.classifiers.Evaluation; import java.io.*; import java.util.Random; public class InstanceTest { /** * @param args */ public

Weka 自动优化参数

阅读更多关于 Weka 自动优化参数

import weka.core.*; import weka.classifiers.*; import weka.classifiers.meta.*; import weka.classifiers.trees.*; import java.io.*; /** * A little example for optimizing J48's confidence parameter with * CVPArameterSelection meta-classifier. * The class expects a dataset as first parameter, class attribute is * assumed to be the last attribute. * * @author FracPete (fracpete at waikato dot ac dot nz) */ public class CVParam { public static void main(String[] args) throws Exception { // load data BufferedReader reader = new BufferedReader(new FileReader(args[0])); Instances data = new Instances

阅读更多关于 weka

// remove instances with missing class Instances newData = new Instances(data); newData.deleteWithMissingClass(); m_structure = new Instances(newData, 0); m_Random = new Random(getSeed()); if (m_classifiersToLoad.size() > 0) { m_preBuiltClassifiers.clear(); loadClassifiers(data); if (m_Classifiers.length == 1 && m_Classifiers[0] instanceof weka.classifiers.rules.ZeroR) { // remove the single ZeroR m_Classifiers = new Classifier[0]; } } Classifier [] Classifiers = new Classifier[3]; //m_Classifiers[0]=new LibSVM(); Classifiers[0]=new J48(); Classifiers[1]=new Logistic(); Classifiers[2]=new SMO(

订阅 weka