decision-tree | 易学教程

Sklearn : How to balance classification using DecisionTreeClassifier?

阅读更多关于 Sklearn : How to balance classification using DecisionTreeClassifier?

I have a data set where the classes are unbalanced. The classes are either 0 , 1 or 2 . How can I calculate the prediction error for each class and then re-balance weights accordingly in Sklearn . If you want to fully balance (treat each class as equally important) you can simply pass class_weight='balanced' , as it is stated in the docs : The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)) If the frequency of class A is 10% and the frequency of class B is 90%, then

how to obtain the trained best model from a crossvalidator

阅读更多关于 how to obtain the trained best model from a crossvalidator

I built a pipeline including a DecisionTreeClassifier(dt) like this val pipeline = new Pipeline().setStages(Array(labelIndexer, featureIndexer, dt, labelConverter)) Then I used this pipeline as the estimator in a CrossValidator in order to get a model with the best set of hyperparameters like this val c_v = new CrossValidator().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator().setLabelCol("indexedLabel").setPredictionCol("prediction")).setEstimatorParamMaps(paramGrid).setNumFolds(5) Finally, I could train a model on a training test with this crossvalidator val model =

Updating a Decision Tree With New Data

阅读更多关于 Updating a Decision Tree With New Data

问题 I am new to decision trees. I am planning to build a large decision tree that I would like to update later with additional data. What is the best approach to this? Can any decision tree be later updated? 回答1: Decision trees are most often trained on all available data. That is, when you have new data, you retrain the entire tree. Since this process is very fast it is usually not problematic. If data is too big to fit in memory, you can often get around it by subsampling (row sampling) the

interpreting Graphviz output for decision tree regression

阅读更多关于 interpreting Graphviz output for decision tree regression

I'm curious what the value field is in the nodes of the decision tree produced by Graphviz when used for regression. I understand that this is the number of samples in each class that are separated by a split when using decision tree classification but I'm not sure what it means for regression. My data has a 2 dimensional input and a 10 dimensional output. Here is an example of what a tree looks like for my regression problem: produced using this code & visualized with webgraphviz # X = (n x 2) Y = (n x 10) X_test = (m x 2) input_scaler = pickle.load(open("../input_scaler.sav","rb")) reg =

R Error: “In numerical expression has 19 elements: only the first used”

阅读更多关于 R Error: “In numerical expression has 19 elements: only the first used”

问题 I created a dataframe: totalDeposit <- cumsum(testd$TermDepositAMT[s1$ix]) which is basically calculating cumulative sum of TermDeposit amounts in testd dataframe and storing it in total deposit. This works perfrectly ok. I then need to calculate the average of the deposit amount and i use the following: avgDeposit <- totalDeposit / (1:testd) but get an error message Error in 1:testd : NA/NaN argument In addition: Warning message: In 1:testd : numerical expression has 19 elements: only the

Plot party decision tree

阅读更多关于 Plot party decision tree

I have the following plot as you can see in the picture, Is there any way to see exact number of percentage in the leaf nodes? If you want to "see" the percentages, the easiest way is to make a table() of the terminal nodes vs. the response and then look at the conditional proportions. If you want to "see" the proportions in the barplot, then there was no possibility to do this up to now. However, I tweaked the node_barplot() function to accomodate this feature. So if you re-install the partykit package (successor of the party package) from R-Forge you can try it: install.packages("partykit",

Decision trees and rule engines (Drools)

阅读更多关于 Decision trees and rule engines (Drools)

问题 In the application that I'm working on right now, I need to periodically check eligibility of tens of thousands of objects for some kind of a service. The decision diagram itself is in the following form, just way larger: In each of the end nodes (circles), I need to run an action (change an object's field, log information etc). I tried using Drool Expert framework, but in that case I'd need to write a long rule for every path in the diagram leading to an end node. Drools Flow doesn't seem to

Plotting a decision tree with pydot

阅读更多关于 Plotting a decision tree with pydot

I have trained a decision tree (Python dictionary) as below. Now I am trying to plot it using pydot . In defining each node of the tree (pydot graph), I appoint it a unique (and verbose) name and a brief label. My problem is that in the resulting figure that I get by writing to a .png, I see the verbose node names and not the node labels . I have followed the answer by @Martijn Pieters here . I do not know what I am missing, any ideas? import pydot tree= {'salary': {'41k-45k': 'junior', '46k-50k': {'department': {'marketing': 'senior', 'sales': 'senior', 'systems': 'junior'}}, '36k-40k':

How to retrieve class values from WEKA using MATLAB

阅读更多关于 How to retrieve class values from WEKA using MATLAB

I'm trying to retrieve classes from WEKA using MATLAB and WEKA API. All looks fine but classes are always 0. Any idea ?? My data set has 241 atributes, applying WEKA to this dataset I'm obtaining correct results. 1st train and test objects are created than classifier is build and classifyInstance performed. But this give wrong result train = [xtrain ytrain]; test = [xtest]; save ('train.txt','train','-ASCII'); save ('test.txt','test','-ASCII'); %## paths WEKA_HOME = 'C:\Program Files\Weka-3-7'; javaaddpath([WEKA_HOME '\weka.jar']); fName = 'train.txt'; %## read file loader = weka.core

Combining Weak Learners into a Strong Classifier

阅读更多关于 Combining Weak Learners into a Strong Classifier

How do I combine few weak learners into a strong classifier? I know the formula, but the problem is that in every paper about AdaBoost that I've read there are only formulas without any example. I mean - I got weak learners and their weights, so I can do what the formula tells me to do (multiply learner by its weight and add another one multiplied by its weight and another one etc.) but how exactly do I do that? My weak learners are decision stumps. They got attribute and treshold, so what do I multiply? If I understand your question correctly, you have a great explanation on how boosting