decision-tree

Incremental entropy computation

断了今生、忘了曾经 提交于 2019-12-22 18:35:09
问题 Let std::vector<int> counts be a vector of positive integers and let N:=counts[0]+...+counts[counts.length()-1] be the the sum of vector components. Setting pi:=counts[i]/N , I compute the entropy using the classic formula H=p0*log2(p0)+...+pn*log2(pn) . The counts vector is changing --- counts are incremented --- and every 200 changes I recompute the entropy. After a quick google and stackoverflow search I couldn't find any method for incremental entropy computation. So the question: Is

Map predictions back to IDs - Python Scikit Learn DecisionTreeClassifier

六眼飞鱼酱① 提交于 2019-12-22 17:17:09
问题 I have a dataset that has a unique identifier and other features. It looks like this ID LenA TypeA LenB TypeB Diff Score Response 123-456 51 M 101 L 50 0.2 0 234-567 46 S 49 S 3 0.9 1 345-678 87 M 70 M 17 0.7 0 I split it up into training and test data. I am trying to classify test data into two classes from a classifier trained on training data. I want the identifier in the training and testing dataset so I can map the predictions back to the IDs . Is there a way that I can assign the

Parse a CSV file using python (to make a decision tree later) [closed]

醉酒当歌 提交于 2019-12-22 08:53:52
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . First off, full disclosure: This is going towards a uni assignment, so I don't want to receive code. :). I'm more looking for

Native Java Solution to Decision Table

ぐ巨炮叔叔 提交于 2019-12-22 05:06:02
问题 I'm haiving an interesting discussion with an esteemed colleague and would like some additional input... I need to implement some basic decision table logic in my application. I was looking to use OpenL Tablets which represents decision data in an Excel spreadsheet. I like it, it's easy to set up and maintain and has a small memory and processing footprint. I can add new tables easily and I have some tables with over 100 rows and upto 10 conditions. This data is pretty static and rarely

Feature importances - Bagging, scikit-learn

北慕城南 提交于 2019-12-22 04:53:07
问题 For a project I am comparing a number of decision trees, using the regression algorithms (Random Forest, Extra Trees, Adaboost and Bagging) of scikit-learn. To compare and interpret them I use the feature importance , though for the bagging decision tree this does not look to be available. My question: Does anybody know how to get the feature importances list for Bagging? Greetings, Kornee 回答1: Are you talking about BaggingClassifier? It can be used with many base estimators, so there is no

Feature importances - Bagging, scikit-learn

半腔热情 提交于 2019-12-22 04:52:28
问题 For a project I am comparing a number of decision trees, using the regression algorithms (Random Forest, Extra Trees, Adaboost and Bagging) of scikit-learn. To compare and interpret them I use the feature importance , though for the bagging decision tree this does not look to be available. My question: Does anybody know how to get the feature importances list for Bagging? Greetings, Kornee 回答1: Are you talking about BaggingClassifier? It can be used with many base estimators, so there is no

Working with decision trees

不羁岁月 提交于 2019-12-22 04:36:17
问题 I know tl;dr; I'll try to explain my problem without bothering you with ton's of crappy code. I'm working on a school assignment. We have pictures of smurfs and we have to find them with foreground background analysis. I have a Decision Tree in java that has all the data (HSV histograms) 1 one single node. Then tries to find the best attribute (from the histogram data) to split the tree on. Then executes the split and creates a left and a right sub tree with the data split over both node

how to obtain the trained best model from a crossvalidator

孤街浪徒 提交于 2019-12-21 16:55:16
问题 I built a pipeline including a DecisionTreeClassifier(dt) like this val pipeline = new Pipeline().setStages(Array(labelIndexer, featureIndexer, dt, labelConverter)) Then I used this pipeline as the estimator in a CrossValidator in order to get a model with the best set of hyperparameters like this val c_v = new CrossValidator().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator().setLabelCol("indexedLabel").setPredictionCol("prediction")).setEstimatorParamMaps(paramGrid

interpreting Graphviz output for decision tree regression

走远了吗. 提交于 2019-12-21 11:03:16
问题 I'm curious what the value field is in the nodes of the decision tree produced by Graphviz when used for regression. I understand that this is the number of samples in each class that are separated by a split when using decision tree classification but I'm not sure what it means for regression. My data has a 2 dimensional input and a 10 dimensional output. Here is an example of what a tree looks like for my regression problem: produced using this code & visualized with webgraphviz # X = (n x

How does the C4.5 Algorithm handle continuous data?

五迷三道 提交于 2019-12-21 05:25:11
问题 I am implementing the C4.5 algorithm in .net , however I don't have clear idea of how it deals "continuous (numeric) data". Could someone give me a more detailed explanation? 回答1: For continuous data C4.5 uses a threshold value where everything less than the threshold is in the left node, and everything greater than the threshold goes in the right node. The question is how to create that threshold value from the data you're given. The trick there is to sort your data by the continuous