decision-tree | 易学教程

partykit: Modify terminal node to include standard deviation and significance of regressors

阅读更多关于 partykit: Modify terminal node to include standard deviation and significance of regressors

问题 I would like to be able to personalize the plot that it is displayed to include standard deviation and statistical significance of the regressors after using the partykit::mob() function. The following code is from partykit documentation. library("partykit") if(require("mlbench")) { ## Pima Indians diabetes data data("PimaIndiansDiabetes", package = "mlbench") ## a simple basic fitting function (of type 1) for a logistic regression logit <- function(y, x, start = NULL, weights = NULL, offset

R: Obtaining Rules from a Function

阅读更多关于 R: Obtaining Rules from a Function

问题 I am using the R programming language. I used the "rpart" library and fit a decision tree using some data: #from a previous question : https://stackoverflow.com/questions/65678552/r-changing-plot-sizes library(rpart) car.test.frame$Reliability = as.factor(car.test.frame$Reliability) z.auto <- rpart(Reliability ~ ., car.test.frame) plot(z.auto) text(z.auto, use.n=TRUE, xpd=TRUE, cex=.8) This is good, but I am looking for an easier way to summarize the results of this tree in case the tree

R: Obtaining Rules from a Function

阅读更多关于 R: Obtaining Rules from a Function

R: Obtaining Rules from a Function

阅读更多关于 R: Obtaining Rules from a Function

R: Obtaining Rules from a Function

阅读更多关于 R: Obtaining Rules from a Function

Performing one hot encoding on two columns of string data

阅读更多关于 Performing one hot encoding on two columns of string data

问题 I am trying to predict 'Full_Time_Home_Goals' My code is: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import mean_absolute_error from sklearn.ensemble import RandomForestRegressor import os import xlrd import datetime import numpy as np # Set option to display all the rows and columns in the dataset. If there are more rows, adjust number accordingly. pd.set_option('display.max_rows', 5000) pd.set

How to manually select the features of the decision tree

阅读更多关于 How to manually select the features of the decision tree

问题 I need to be able to change the features (with the machine learning meaning) that are used to build the decision tree. Given the example of the Iris Dataset, I want to be able to select the Sepallength as the feature used in the root node and the Petallength as a feature used in the nodes of the first level, and so on. I want to be clear, my aim is not to change the minimum sample split and the random state of the decision tree. But rather to select the features - the characteristics of the

Random forest tree growing algorithm

阅读更多关于 Random forest tree growing algorithm

问题 I'm doing a Random Forest implementation (for classification), and I have some questions regarding the tree growing algorithm mentioned in literature. When training a decision tree, there are 2 criteria to stop growing a tree: a. Stop when there are no more features left to split a node on. b. Stop when the node has all samples in it belonging to the same class. Based on that, 1. Consider growing one tree in the forest. When splitting a node of the tree, I randomly select m of the M total

Finding a corresponding leaf node for each data point in a decision tree (scikit-learn)

阅读更多关于 Finding a corresponding leaf node for each data point in a decision tree (scikit-learn)

问题 I'm using decision tree classifier from the scikit-learn package in python 3.4, and I want to get the corresponding leaf node id for each of my input data point. For example, my input might look like this: array([[ 5.1, 3.5, 1.4, 0.2], [ 4.9, 3. , 1.4, 0.2], [ 4.7, 3.2, 1.3, 0.2]]) and let's suppose the corresponding leaf nodes are 16, 5 and 45 respectively. I want my output to be: leaf_node_id = array([16, 5, 45]) I have read through the scikit-learn mailing list and related questions on SF

Scikit-Learn Decision Tree: Probability of prediction being a or b?

阅读更多关于 Scikit-Learn Decision Tree: Probability of prediction being a or b?

问题 I have a basic decision tree classifier with Scikit-Learn: #Used to determine men from women based on height and shoe size from sklearn import tree #height and shoe size X = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]] Y=["male","female","male","female","female","male","male","female"] #creating a decision tree clf = tree.DecisionTreeClassifier() #fitting the data to the tree clf.fit(X, Y) #predicting the gender based on a prediction prediction = clf.predict([68,9]) #print