decision-tree

How to implement the output of decision tree built using the ctree (party package)?

一世执手 提交于 2019-12-13 14:08:27
问题 I have built a decision tree using the ctree function via party package. it has 1700 nodes. Firstly, is there a way in ctree to give the maxdepth argument? I tried control_ctree option but, it threw some error message saying couldnt find ctree function. Also, how can I consume the output of this tree?. How can it be implemented for other platforms like SAS or SQL. I also have another doubt as to what does the value "* weights = 4349 " at the end of the node signify. How will I know, that

ValueError: setting an array element with a sequence with Decision Tree where all the rows have equal elements?

自闭症网瘾萝莉.ら 提交于 2019-12-13 07:18:00
问题 I am trying to fit a decision tree to matrices of features and labels. Here is my code: print FEATURES_DATA[0] print "" print TARGET[0] print "" print np.unique(list(map(len, FEATURES_DATA[0]))) which gives the following output: [ array([[3, 3, 3, ..., 7, 7, 7], [3, 3, 3, ..., 7, 7, 7], [3, 3, 3, ..., 7, 7, 7], ..., [2, 2, 2, ..., 6, 6, 6], [2, 2, 2, ..., 6, 6, 6], [2, 2, 2, ..., 6, 6, 6]], dtype=uint8)] [ array([[31], [31], [31], ..., [22], [22], [22]], dtype=uint8)] [463511] The matrix

How to visualize H2O Tree?

蓝咒 提交于 2019-12-13 04:27:05
问题 I have df data_categorical and a model model . I converted my df to h2o frame with data = h2o.H2OFrame(data_categorical) and trained my model with model = H2ORandomForestEstimator(ntrees=1, max_depth=20, nfolds=10) # Train model model.train(x=training_columns, y=response_column, training_frame=train) I'm trying to visualize the tree that is created (note that I only need one tree) but I can't seem to do that. I downloaded the mojo file with model.download_mojo(path,get_genmodel_jar=True) But

Is there a way to see the order of nodes categorizing data in decision trees when not allowed to install graphviz nor pydotplus?

江枫思渺然 提交于 2019-12-13 03:18:36
问题 I need to know the order of the nodes and the scores for each one, once I have ran the decision tree model. As I'm working in my office computer, the installations are very restricted and I'm not allowed to download graphviz nor pydotplus. It doesn't matter that there is no graphic representation of the model; I just want to know the classification order/process the algorithm is using. I'm using sklearn.tree , sklearn.metrics , and sklearn.cross_validation . 回答1: You can make use of plot_tree

AWS sagemaker RandomCutForest (RCF) vs scikit lean RandomForest (RF)?

丶灬走出姿态 提交于 2019-12-13 03:10:10
问题 Is there a difference between the two, or are they different names for the same algorithm? 回答1: RandomCutForest (RCF) is an unsupervised method primarily used for anomaly detection, while RandomForest (RF) is a supervised method that can be used for regression or classification. For RCF, see documentation (here) and notebook example (here) 来源: https://stackoverflow.com/questions/56728230/aws-sagemaker-randomcutforest-rcf-vs-scikit-lean-randomforest-rf

R C5.0 get rule and probability for every leaf

十年热恋 提交于 2019-12-13 02:11:42
问题 I think during my research to solve this question I came pretty close. I am looking for something like this for the C5.0 package. The method provided in the SO answer works with a party object. However the C5.0 package does not support as.party . On my further research I found this comment that the maintainer of the C5.0 package already programmed the function, but did not export it. I thought great this should work, but unfortunately the suggested function C50:::as.party.C5.0(mod1) throws

Find All Binary Splits of a Nominal Attribute

﹥>﹥吖頭↗ 提交于 2019-12-12 19:21:47
问题 Question I'm trying to build a binary decision tree classifier in Python from scratch based on a data set that has only nominal attributes. One step I'm stuck on is finding all possible ways to compute a binary split of a nominal attribute. For example, for an attribute with possible values [a, b, c, d], I am looking for a way to split these in two arrays such that we obtain: left right ---- ----- a bcd b acd c abd d abc ab cd ac bd ad bc without duplicate splits (e.g. we don't need "bc" in

ctree plot decision tree in party package in R , terminal node occurs some weird numbers - issue?

霸气de小男生 提交于 2019-12-12 19:17:46
问题 I came across something really odd.. and I couldn't figured it out why.. I use the same code here below : library(party) r_tree <- ctree(readingSkills$nativeSpeaker ~ readingSkills$age + readingSkills$shoeSize + readingSkills$shoeSize + readingSkills$score,data = readingSkills) plot(r_tree,type = "simple") r_tree two week ago I got normal graph .. but today my terminal nodes have some odd numbers in them like showing in this picture below.. I have tried to restarted my PC , uninstalled the

Implementing a decision tree using h2o

試著忘記壹切 提交于 2019-12-12 15:34:59
问题 I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning) Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier? In scikit the code

How can decision tree model in Spark (pyspark) be visualized?

被刻印的时光 ゝ 提交于 2019-12-12 09:42:55
问题 I am trying to visualize decision tree structure in pyspark. But all the tools are for data. I could not find any for visualizing tree structure. Or is there a way I can visualize using the rules from toDebugString ? 回答1: I have tried to do the following in order to create a visualization : Parse Spark Decision Tree output to a JSON format. Use the JSON file as an input to a D3.js visualization. For more code you can refer to my prototype at GitHub here. 来源: https://stackoverflow.com