decision-tree

GraphViz's executables not found : Anaconda-3

一世执手 提交于 2019-12-04 14:01:20
问题 I am trying to display the tree output , but when I run the script below, I receive an error like : InvocationException: GraphViz's executables not found I've searched similar topics here, but most of them are Mac related. I'm using Windows 10 64-bit operating system, and I use Anaconda-3 64 bit. I'd love to hear your suggestions on this. #Displaying the decision tree from sklearn import tree #from StringIO import StringIO from io import StringIO #from StringIO import StringIO from IPython

How to implement an interactive decision tree in C#

半城伤御伤魂 提交于 2019-12-04 13:39:04
I need to allow the users choose their own path by picking between two simple choices displayed on their screen in order to progress to the next set of choices, until they get to one of the endings, i.e something like this should be achieved: I have tried the following code, but only the left side is evaluated each time. I am wondering how can I achieve a results like the above image (covering all the branches)? For instance, if the user selects "No" the application shouldn't ask any further question from the user and just simply shows the "Maybe you want a Pizza" message. I have done this

Classification tree in sklearn giving inconsistent answers

对着背影说爱祢 提交于 2019-12-04 12:25:54
问题 I am using a classification tree from sklearn and when I have the the model train twice using the same data, and predict with the same test data, I am getting different results. I tried reproducing on a smaller iris data set and it worked as predicted. Here is some code from sklearn import tree from sklearn.datasets import iris clf = tree.DecisionTreeClassifier() clf.fit(iris.data, iris.target) r1 = clf.predict_proba(iris.data) clf.fit(iris.data, iris.target) r2 = clf.predict_proba(iris.data)

How to set costs matrix for C5.0 Package in R?

浪尽此生 提交于 2019-12-04 11:34:46
I have googled much in the web, but don't find any useful description for the 'costs' parameter for C5.0 function in R. From the C5.0 R manual book, it just says "a matrix of costs associated with the possible errors. The matrix should have C columns and rows where C is the number of class levels". It does not tell me whether the row or the column is the predicated result by the model. Can anyone help? Here is a quote from the help page of C5.0 (version 0.1.0-15): The cost matrix should by CxC, where C is the number of classes. Diagonal elements are ignored. Columns should correspond to the

How to handle categorical features for Decision Tree, Random Forest in spark ml?

流过昼夜 提交于 2019-12-04 10:38:34
I am trying to build decision tree and random forest classifier on the UCI bank marketing data -> https://archive.ics.uci.edu/ml/datasets/bank+marketing . There are many categorical features (having string values) in the data set. In the spark ml document, it's mentioned that the categorical variables can be converted to numeric by indexing using either StringIndexer or VectorIndexer. I chose to use StringIndexer (vector index requires vector feature and vector assembler which convert features to vector feature accepts only numeric type ). Using this approach, each of the level of a

Looking for a C++ implementation of the C4.5 algorithm

回眸只為那壹抹淺笑 提交于 2019-12-04 09:59:30
I've been looking for a C++ implementation of the C4.5 algorithm , but I haven't been able to find one yet. I found Quinlan's C4.5 Release 8 , but it's written in C... has anybody seen any open source C++ implementations of the C4.5 algorithm? I'm thinking about porting the J48 source code (or simply writing a wrapper around the C version) if I can't find an open source C++ implementation out there, but I hope I don't have to do that! Please let me know if you have come across a C++ implementation of the algorithm. Update I've been considering the option of writing a thin C++ wrapper around

What libraries for modeling complex questionaire in Python?

☆樱花仙子☆ 提交于 2019-12-04 09:58:44
For a medical website I'm trying to model a questionnaire that should result in a range of possible diagnoses. The questionnaire is fairly complex with a lot of conditionals. I made a flowchart/decision tree to reflect this questionnaire. I'm using Django to make the website. Currently I'm thinking of using Python Graph to turn the flow chart into a weighted graph. Each question would be a node and each answer would be an edge+label. I could then walk through the complete graph and the endpoint of the walk would be the fitting diagnose. Is python graph the best library to model this

Calculating prediction accuracy of a tree using rpart's predict method (R programming)

放肆的年华 提交于 2019-12-04 09:36:46
I have constructed a decision tree using rpart for a dataset. I have then divided the data into 2 parts - a training dataset and a test dataset. A tree has been constructed for the dataset using the training data. I want to calculate the accuracy of the predictions based on the model that was created. My code is shown below: library(rpart) #reading the data data = read.table("source") names(data) <- c("a", "b", "c", "d", "class") #generating test and train data - Data selected randomly with a 80/20 split trainIndex <- sample(1:nrow(x), 0.8 * nrow(x)) train <- data[trainIndex,] test <- data[

Extract and Visualize Model Trees from Sparklyr

邮差的信 提交于 2019-12-04 07:44:29
Does anyone have any advice about how to convert the tree information from sparklyr's ml_decision_tree_classifier, ml_gbt_classifier, or ml_random_forest_classifier models into a.) a format that can be understood by other R tree-related libraries and (ultimately) b.) a visualization of the trees for non-technical consumption? This would include the ability to convert back to the actual feature names from the substituted string indexing values that are produced during the vector assembler. The following code is copied liberally from a sparklyr blog post for the purposes of providing an example:

Different decision tree algorithms with comparison of complexity or performance

三世轮回 提交于 2019-12-04 07:27:41
问题 I am doing research on data mining and more precisely, decision trees. I would like to know if there are multiple algorithms to build a decision trees (or just one?), and which is better, based on criteria such as Performance Complexity Errors in decision making and more. 回答1: Decision Tree implementations differ primarily along these axes: the splitting criterion (i.e., how "variance" is calculated) whether it builds models for regression (continuous variables, e.g., a score) as well as