weka | 易学教程

How to calculate the threshold value for numeric attributes in Quinlan's C4.5 algorithm?

阅读更多关于 How to calculate the threshold value for numeric attributes in Quinlan's C4.5 algorithm?

I am trying to find how the C4.5 algorithm determines the threshold value for numeric attributes. I have researched and can not understand, in most places I've found this information: The training samples are first sorted on the values of the attribute Y being considered. There are only a finite number of these values, so let us denote them in sorted order as {v1,v2, …,vm}. Any threshold value lying between vi and vi+1 will have the same effect of dividing the cases into those whose value of the attribute Y lies in {v1, v2, …, vi} and those whose value is in {vi+1, vi+2, …, vm}. There are thus

Majority vote algorithm in Weka.classifiers.meta.vote

阅读更多关于 Majority vote algorithm in Weka.classifiers.meta.vote

What is the majority vote algorithm used in Weka. I tried to figure out its code but could not understand it. In Weka you can select multiple classifiers to be used in Weka.classifiers.meta.vote . If you select Majority Voting as combinationRule (which only works with nominal classes), then each of these classifiers will predict a nominal class label for a test sample. The label which was predicted the most will then be selected as output of the vote classifier. For example. You select the following classifiers to be used: trees.J48 , bayes.NaiveBayes and functions.LibSVM to predict the

How do I use a JSON file with weka

阅读更多关于 How do I use a JSON file with weka

I have a JSON file and want to open the data in weka, but when I do, I get the following error: Looking around on the mailing list , there are a few questions about JSON, but TL;DR except that I noticed talk of JSON in the "format weka expects". Of course, there was no mention of what that format is. About to take a dive in the source, but I hope SO users can help before I spend too much time on this. To gain an understanding about the format of the JSON object and its relationship to ARFF. The steps were surprisingly simple. Use the GUI tool to do the following: Select the Explorer Option

K-means with really large matrix

阅读更多关于 K-means with really large matrix

I have to perform a k-means clustering on a really huge matrix (about 300.000x100.000 values which is more than 100Gb). I want to know if I can use R software to perform this or weka. My computer is a multiprocessor with 8Gb of ram and hundreds Gb of free space. I have enough space for calculations but loading such a matrix seems to be a problem with R (I don't think that using the bigmemory package would help me and big matrix use automatically all my RAM then my swap file if not enough space). So my question is : what software should I use (eventually in association with some other packages

How to perform one operation on each executor once in spark

阅读更多关于 How to perform one operation on each executor once in spark

问题 I have a weka model stored in S3 which is of size around 400MB. Now, I have some set of record on which I want to run the model and perform prediction. For performing prediction, What I have tried is, Download and load the model on driver as a static object , broadcast it to all executors. Perform a map operation on prediction RDD. ----> Not working, as in Weka for performing prediction, model object needs to be modified and broadcast require a read-only copy. Download and load the model on

Learning Weka on the Command Line

阅读更多关于 Learning Weka on the Command Line

问题 I am fairly new to Weka and even more new to Weka on the command line. I find documentation is poor and I am struggling to figure out a few things to do. For example, want to take two .arff files, one for training, one for testing and get an output of predictions for the missing labels in the test data. How can I do this? I have this code as a starting block java -classpath weka.jar weka.classifiers.meta.FilteredClassifier -t "training_file_with_missing_values.arff" -T "test_file_with_missing

Weka J48 Classifier: Cannot handle numeric class?

阅读更多关于 Weka J48 Classifier: Cannot handle numeric class?

问题 I'm now trying to build a J48 (C4.5) classifier model on my training data using Weka. First I do this, which seems to go OK: java -Xmx10G -cp /weka/weka.jar weka.core.converters.TextDirectoryLoader -dir /home/test/cats > /home/test/cats.arff This seems to go OK too: java -Xmx10G -cp /weka/weka.jar weka.filters.unsupervised.attribute.StringToWordVector -i /home/test/cats.arff -o /home/test/cats-vector.arff This does not go OK: java -Xmx10G -cp /weka/weka.jar weka.classifiers.trees.J48 -t /home

Increase heap size in java for weka

阅读更多关于 Increase heap size in java for weka

I'm trying to increase the heap size in java for weka which keeps crashing. I used the suggested line: > java -Xmx500m -classpath but I get the following error: -classpath requires class path specification I'm not sure what this means. Any suggestions? What I found was the actual issue was in the file 'RunWeka.ini' in '\Program Files (x86)\Weka-3-6' . I opened it with notepad and in the middle of the file there is a line 'maxheap = 512m' . I changed the line to read 'maxheap=2000m' , saved the file and reloaded weka and this fixed my problems. I'm not sure if this is the correct way to do it

SMO confidence measure in weka

阅读更多关于 SMO confidence measure in weka

问题 I'm writing a classification code using the smo class of weka. But what i'm yet to find is a confidence measure of the classification of an instance. It always either returns 0 or 1 when distributionForInstance is called. I have two classes to be classified into. Any idea how i can get this measure? Thanks. 回答1: Ok I figured out how to get this in case it might help someone. Get the source code for SMO.java and add it to your package. Resolve imports if any. Set m_fitLogisticModels to true .

Learning Weka on the Command Line

阅读更多关于 Learning Weka on the Command Line

I am fairly new to Weka and even more new to Weka on the command line. I find documentation is poor and I am struggling to figure out a few things to do. For example, want to take two .arff files, one for training, one for testing and get an output of predictions for the missing labels in the test data. How can I do this? I have this code as a starting block java -classpath weka.jar weka.classifiers.meta.FilteredClassifier -t "training_file_with_missing_values.arff" -T "test_file_with_missing_values.arff" -F weka.filters.unsupervised.attribute.ReplaceMissingValues -- -c last -W weka