decision-tree

How to handle categorical variables in sklearn GradientBoostingClassifier?

青春壹個敷衍的年華 提交于 2019-12-02 23:53:09
I am attempting to train models with GradientBoostingClassifier using categorical variables. The following is a primitive code sample, just for trying to input categorical variables into GradientBoostingClassifier . from sklearn import datasets from sklearn.ensemble import GradientBoostingClassifier import pandas iris = datasets.load_iris() # Use only data for 2 classes. X = iris.data[(iris.target==0) | (iris.target==1)] Y = iris.target[(iris.target==0) | (iris.target==1)] # Class 0 has indices 0-49. Class 1 has indices 50-99. # Divide data into 80% training, 20% testing. train_indices = list

scikit learn - feature importance calculation in decision trees

蓝咒 提交于 2019-12-02 23:41:46
I'm trying to understand how feature importance is calculated for decision trees in sci-kit learn. This question has been asked before, but I am unable to reproduce the results the algorithm is providing. For example: from StringIO import StringIO from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier from sklearn.tree.export import export_graphviz from sklearn.feature_selection import mutual_info_classif X = [[1,0,0], [0,0,0], [0,0,1], [0,1,0]] y = [1,0,1,1] clf = DecisionTreeClassifier() clf.fit(X, y) feat_importance = clf.tree_.compute_feature_importances

How to implement decision tree with c# (visual studio 2008) - Help

假如想象 提交于 2019-12-02 18:41:06
I have a decision tree that i need to turn to a code in C# The simple way of doing it is using if-else statements but in this solution i will need to create 4-5 nested conditions. I am looking for a better way to do it and so far i read a little bit about rule engines. Do you have something else to suggest for an efficient way to develop decision tree with 4-5 nested conditions? I implemented a simple decision tree as a sample in my book. The code is available online here , so perhaps you could use it as an inspiration. A decision is essentially represented as a class that has references to

How to explore a decision tree built using scikit learn

半腔热情 提交于 2019-12-02 18:32:53
I am building a decision tree using clf = tree.DecisionTreeClassifier() clf = clf.fit(X_train, Y_train) This all works fine. However, how do I then explore the decision tree? For example, how do I find which entries from X_train appear in a particular leaf? PabTorre You need to use the predict method. After training the tree, you feed the X values to predict their output. from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(random_state=0) iris = load_iris() tree = clf.fit(iris.data, iris.target) tree.predict(iris.data) output: >>>

Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

与世无争的帅哥 提交于 2019-12-02 14:18:22
问题 I have to use this code: val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth); I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures as numerical. I have this map: val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7); However it only works with

Different decision tree algorithms with comparison of complexity or performance

二次信任 提交于 2019-12-02 14:01:48
I am doing research on data mining and more precisely, decision trees. I would like to know if there are multiple algorithms to build a decision trees (or just one?), and which is better, based on criteria such as Performance Complexity Errors in decision making and more. doug Decision Tree implementations differ primarily along these axes: the splitting criterion (i.e., how "variance" is calculated) whether it builds models for regression (continuous variables, e.g., a score) as well as classification (discrete variables, e.g., a class label) technique to eliminate/reduce over-fitting whether

Decision tree vs. Naive Bayes classifier [closed]

我只是一个虾纸丫 提交于 2019-12-02 13:52:39
I am doing some research about different data mining techniques and came across something that I could not figure out. If any one have any idea that would be great. In which cases is it better to use a Decision tree and other cases a Naive Bayes classifier? Why use one of them in certain cases? And the other in different cases? (By looking at its functionality, not at the algorithm) Anyone have some explanations or references about this? Decision Trees are very flexible, easy to understand, and easy to debug. They will work with classification problems and regression problems. So if you are

What does scikit-learn DecisionTreeClassifier.tree_.value do?

耗尽温柔 提交于 2019-12-02 13:28:40
问题 I am working on a DecisionTreeClassifier model and I want to understand the path chosen by the model. So I need to know what values give the DecisionTreeClassifier.tree_.value Thank you, 回答1: Well, you are correct in that the documentation is actually obscure about this (but to be honest, I am not sure about its usefulness, too). Let's replicate the example from the documentation with the iris data: from sklearn.datasets import load_iris from sklearn import tree iris = load_iris() clf = tree

Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

倾然丶 夕夏残阳落幕 提交于 2019-12-02 09:23:16
I have to use this code: val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth); I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures as numerical. I have this map: val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7); However it only works with DecisionTree.trainClassifier method. I can't use this method because it accepts different arguments than the

Python Checking paths to leaf in binary tree python giving data in the leaf

别来无恙 提交于 2019-12-02 05:34:31
Lets say i have this tree: cough Yes / \ No sneezing sneezing Yes / \ No Yes / \ No fever fever fever fever Yes / \ No Yes/ \No Yes / \ No Yes/ \No dead cold influenza cold dead influenza cold healthy And i want the paths to the illness "influenza" What the output should be is like this: [[True,False,True],[False,True,False]] If you go to right of the root it return True ( Yes ) , if you go to Left its False( No) This is the code I have been trying to do for this function but im doing something wrong it returns not as i want.. def paths_to_illness(self, illness): head=self.__root new_list=[]