decision-tree

How do I classify this value using a decision tree

心不动则不痛 提交于 2019-12-11 03:48:54
问题 Basically my decision tree can't classify a value using the normal algorithm. I get to a node, and there are two options (say, sunny and windy), but at this node my value is different (for example, rainy). Are there any methods to deal with this, e.g. change the tree or just estimate based on other data? I was thinking of assigning the most common value at that node but this is just a guess. 回答1: Have you considered fuzzy logic for the rich/poor continuum? As for things that can't be

Entropy of pure split caculated to NaN

拥有回忆 提交于 2019-12-11 03:27:44
问题 I have written a function to calculate entropy of a vector where each element represents number of elements of a class. function x = Entropy(a) t = sum(a); t = repmat(t, [1, size(a, 2)]); x = sum(-a./t .* log2(a./t)); end e.g: a = [4 0] , then entropy = -(0/4)*log2(0/4) - (4/4)*log2(4/4) But for above function, the entropy is NaN when the split is pure because of log2(0) , as in above example. The entropy of pure split should be zero. How should I solve the problem with least effect on

saving model output from Decision tree train classifier as a text file in Spark Scala platform

Deadly 提交于 2019-12-10 23:59:17
问题 The codes I was using to train the decision tree are as follows: import org.apache.spark.SparkContext import org.apache.spark.mllib.tree.DecisionTree import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.tree.configuration.Algo._ import org.apache.spark.mllib.tree.impurity.Gini import org.apache.spark.mllib.util.MLUtils import org.apache.spark.mllib.evaluation.MulticlassMetrics // Load and parse the data file val data

Learning decision trees on huge datasets

被刻印的时光 ゝ 提交于 2019-12-10 18:36:41
问题 I'm trying to build a binary classification decision tree out of huge (i.e. which cannot be stored in memory) datasets using MATLAB. Essentially, what I'm doing is: Collect all the data Try out n decision functions on the data Pick out the best decision function to separate the classes within the data Split the original dataset into 2 Recurse on the splits The data has k attributes and a classification, so it is stored as a matrix with a huge number of rows, and k+1 columns. The decision

Getting the distribution of values at the leaf node for a DecisionTreeRegressor in scikit-learn

和自甴很熟 提交于 2019-12-10 18:29:48
问题 By default, a scikit-learn DecisionTreeRegressor returns the mean of all target values from the training set in a given leaf node. However, I am interested in getting back the list of target values from my training set that fell into the predicted leaf node. This will allow me to quantify the distribution, and also calculate other metrics like standard deviation. Is this possible using scikit-learn? 回答1: I think what you're looking for is the apply method of the tree object. See here for the

Why am I getting a negative information gain?

 ̄綄美尐妖づ 提交于 2019-12-10 17:47:36
问题 [SOLVED] My mistake was that I did not realise that entropy is 0 if all are of one type. Thus if all are positive, entropy is 0 and if all are negative it is zero as well. Entropy will be 1 if equal amount are positive and negative. It does not make sense that one would get negative information gain. However based on this example I am getting a negative information gain. here is the data: And if I calculate the information gain on the Humidity attribute I get this: Obviously I am missing

PHP function to increment variable by 1 each time

亡梦爱人 提交于 2019-12-10 15:46:52
问题 I have started writing a PHP script for a game about creatures, there are 4 yes/no questions and what I am trying to do is write a function that will display 2 buttons that say yes and no and give then different names each time I run the function, for example yes1 and no1, then the next time the function is run the names of the buttons will be yes2 and no2. I have attempted to do this already but it is not working correctly, below is the code I have done so far, any help would be much

decision tree in R error:fit is not a tree,just a root

时光毁灭记忆、已成空白 提交于 2019-12-10 04:30:24
问题 good afternoon! I have problem with a decisional trees. f11<-as.factor(Z24train$f1) fit_f1 <- rpart(f11~TSU+TSL+TW+TP,data = Z24train,method="class") plot(fit_f1, uniform=TRUE, main="Classification Tree for Kyphosis") But this error appears: Error in plot.rpart(fit_f1, uniform = TRUE, main = "Classification Tree for Kyphosis") : fit is not a tree, just a root which is the problem? thanks for the help :) 回答1: This is probably due to RPART is not being able to create a decision tree with the

How to plot/visualize a C50 decision tree in R?

匆匆过客 提交于 2019-12-10 03:04:12
问题 I am using the C50 decision tree algorithm. I am able to build the tree and get the summaries, but cannot figure out how to plot or viz the tree. My C50 model is called credit_model In other decision tree packages, I usually use something like plot(credit_model). In rpart it is rpart.plot(credit_model). What is the equivalent in the C50 algorithm to plot? 回答1: Right now, there are none built in. I've been working on an adapter for the partykit package (e.g. as.party ) but have not gotten very

Data Driven Rules Engine - Drools

橙三吉。 提交于 2019-12-10 02:22:40
问题 I have been evaluating Drools as a Rules Engine for use in our Business Web Application. My use case is a Order Management Application. And the rules are of following kind: - If User Type is "SPECIAL" give an extra 5% discount. - If User has made 10+ Purchases already, give an extra 3% discount. - If Product Category is "OLD", give a Gift Hamper to the user worth $5. - If Product Category is "NEW", give a Gift Hamper to the user worth $1 - If User has made purchases of over $1000 in the past,