decision-tree | 易学教程

How to handle categorical variables in sklearn GradientBoostingClassifier?

阅读更多关于 How to handle categorical variables in sklearn GradientBoostingClassifier?

I am attempting to train models with GradientBoostingClassifier using categorical variables. The following is a primitive code sample, just for trying to input categorical variables into GradientBoostingClassifier . from sklearn import datasets from sklearn.ensemble import GradientBoostingClassifier import pandas iris = datasets.load_iris() # Use only data for 2 classes. X = iris.data[(iris.target==0) | (iris.target==1)] Y = iris.target[(iris.target==0) | (iris.target==1)] # Class 0 has indices 0-49. Class 1 has indices 50-99. # Divide data into 80% training, 20% testing. train_indices = list

scikit learn - feature importance calculation in decision trees

阅读更多关于 scikit learn - feature importance calculation in decision trees

I'm trying to understand how feature importance is calculated for decision trees in sci-kit learn. This question has been asked before, but I am unable to reproduce the results the algorithm is providing. For example: from StringIO import StringIO from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier from sklearn.tree.export import export_graphviz from sklearn.feature_selection import mutual_info_classif X = [[1,0,0], [0,0,0], [0,0,1], [0,1,0]] y = [1,0,1,1] clf = DecisionTreeClassifier() clf.fit(X, y) feat_importance = clf.tree_.compute_feature_importances

How to implement decision tree with c# (visual studio 2008) - Help

阅读更多关于 How to implement decision tree with c# (visual studio 2008) - Help

I have a decision tree that i need to turn to a code in C# The simple way of doing it is using if-else statements but in this solution i will need to create 4-5 nested conditions. I am looking for a better way to do it and so far i read a little bit about rule engines. Do you have something else to suggest for an efficient way to develop decision tree with 4-5 nested conditions? I implemented a simple decision tree as a sample in my book. The code is available online here , so perhaps you could use it as an inspiration. A decision is essentially represented as a class that has references to

How to explore a decision tree built using scikit learn

阅读更多关于 How to explore a decision tree built using scikit learn

I am building a decision tree using clf = tree.DecisionTreeClassifier() clf = clf.fit(X_train, Y_train) This all works fine. However, how do I then explore the decision tree? For example, how do I find which entries from X_train appear in a particular leaf? PabTorre You need to use the predict method. After training the tree, you feed the X values to predict their output. from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(random_state=0) iris = load_iris() tree = clf.fit(iris.data, iris.target) tree.predict(iris.data) output: >>>

Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

阅读更多关于 Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

问题 I have to use this code: val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth); I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures as numerical. I have this map: val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7); However it only works with

Different decision tree algorithms with comparison of complexity or performance

阅读更多关于 Different decision tree algorithms with comparison of complexity or performance

I am doing research on data mining and more precisely, decision trees. I would like to know if there are multiple algorithms to build a decision trees (or just one?), and which is better, based on criteria such as Performance Complexity Errors in decision making and more. doug Decision Tree implementations differ primarily along these axes: the splitting criterion (i.e., how "variance" is calculated) whether it builds models for regression (continuous variables, e.g., a score) as well as classification (discrete variables, e.g., a class label) technique to eliminate/reduce over-fitting whether

Decision tree vs. Naive Bayes classifier [closed]

阅读更多关于 Decision tree vs. Naive Bayes classifier [closed]

I am doing some research about different data mining techniques and came across something that I could not figure out. If any one have any idea that would be great. In which cases is it better to use a Decision tree and other cases a Naive Bayes classifier? Why use one of them in certain cases? And the other in different cases? (By looking at its functionality, not at the algorithm) Anyone have some explanations or references about this? Decision Trees are very flexible, easy to understand, and easy to debug. They will work with classification problems and regression problems. So if you are

What does scikit-learn DecisionTreeClassifier.tree_.value do?

阅读更多关于 What does scikit-learn DecisionTreeClassifier.tree_.value do?

问题 I am working on a DecisionTreeClassifier model and I want to understand the path chosen by the model. So I need to know what values give the DecisionTreeClassifier.tree_.value Thank you, 回答1: Well, you are correct in that the documentation is actually obscure about this (but to be honest, I am not sure about its usefulness, too). Let's replicate the example from the documentation with the iris data: from sklearn.datasets import load_iris from sklearn import tree iris = load_iris() clf = tree

Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

阅读更多关于 Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

I have to use this code: val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth); I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures as numerical. I have this map: val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7); However it only works with DecisionTree.trainClassifier method. I can't use this method because it accepts different arguments than the

Python Checking paths to leaf in binary tree python giving data in the leaf

阅读更多关于 Python Checking paths to leaf in binary tree python giving data in the leaf

Lets say i have this tree: cough Yes / \ No sneezing sneezing Yes / \ No Yes / \ No fever fever fever fever Yes / \ No Yes/ \No Yes / \ No Yes/ \No dead cold influenza cold dead influenza cold healthy And i want the paths to the illness "influenza" What the output should be is like this: [[True,False,True],[False,True,False]] If you go to right of the root it return True ( Yes ) , if you go to Left its False( No) This is the code I have been trying to do for this function but im doing something wrong it returns not as i want.. def paths_to_illness(self, illness): head=self.__root new_list=[]

订阅 decision-tree