decision-tree

Weighted Decision Trees using Entropy

一个人想着一个人 提交于 2019-12-12 07:56:55
问题 I'm building a binary classification tree using mutual information gain as the splitting function. But since the training data is skewed toward a few classes, it is advisable to weight each training example by the inverse class frequency. How do I weight the training data? When calculating the probabilities to estimate the entropy, do I take weighted averages? EDIT: I'd like an expression for entropy with the weights. 回答1: State-value weighted entropy as a measure of investment risk. http:/

Does feature order impact Decision tree algorithm in sklearn?

╄→гoц情女王★ 提交于 2019-12-12 06:59:37
问题 I read some material: decision tree document in sklearn A Quora Answer However I can find if I change feature order(the set of feature name: [a,b,c] change into [b,a,c]) in the data. Does this actually affect decision tree result? 回答1: Not really. Sklearn generally uses Cart trees where the best split is decided by picking the feature that minimizes a cost function. So the order of column doesn't really matter. 来源: https://stackoverflow.com/questions/43941163/does-feature-order-impact

Code with 10 fold cross validation in machine learning

元气小坏坏 提交于 2019-12-12 03:55:23
问题 I am just starting to work with machine learning. I tried to run a 10 fold cross-validation using a C5.0 model. I asked the code to return the kappa value. folds = createFolds(mdd.cohort1$edmsemmancomprej, k=10) str(folds) mdd.cohort1_train = mdd.cohort1[-folds$Fold01,] mdd.cohort1_test = mdd.cohort1[folds$Fold01,] library(caret) library(C5.0) library(irr) set.seed(123) folds = createFolds(mdd.cohort1$edmsemmancomprej, k=10) cv_results = lapply(folds, function(x) {mdd.cohort1_train = mdd

Mathematica: part assignment

拟墨画扇 提交于 2019-12-12 03:52:41
问题 I'm trying to implement an algorithm to build a decision tree from a dataset. I wrote a function to calculate the information gain between a subset and a particular partition, then I try all the possible partition and want to choose the "best" partition, in the sense that it's got the lowest entropy. This procedure must be recursive, hence, after the first iteration, it needs to work for every subset of the partition you got in the previous step. These are the data: X = {{1, 0, 1, 1}, {1, 1,

Equivalent of mllib.DecisionTreeModel.toDebugString() in ml.DecisionTreeClassificationModel

爱⌒轻易说出口 提交于 2019-12-12 03:22:41
问题 As the question says, is there any equivalent of Spark org.apache.spark.mllib.tree.model.DecisionTreeClassificationModel.toDebugString() in org.apache.spark.ml.classification.DecisionTreeClassificationModel I have gone through the API doc of the latter and found this method rootNode() which gives back a org.apache.spark.ml.tree.Node object which seems to be a recursive object, so should I use this class instead to build the tree structure myself? Thanks in anticipation. 回答1: org.apache.spark

Python decision tree classification of complex objects

久未见 提交于 2019-12-12 02:06:06
问题 I have a collection of clothing / accessory products (represented by a Python object) with various attributes. These products are generated by a combination of querying an external API and scraping the merchant websites to obtain various attributes. My goal is to develop a classifier that uses these attributes to correctly categorise the products (i.e. into categories such as trousers, t-shirts, dresses etc.). I have both a training and a test data set which are a subset of the entire data

R foreach parallel processing with unexported functions (with C50 example)

邮差的信 提交于 2019-12-12 00:36:05
问题 I am trying to extract rules from a C50 model while parallel processing. This answer helped me to extract the rules from the model object. However as I need the models to be processed in parallel, I am using foreach. This seems to have a problem with the not exported function, as it does not see the data object. Here is some reproducible code: library(foreach) library(doMC) registerDoMC(2) j = c(1,2) result = foreach(i = j) %dopar% { library(C50) d = iris model <- C5.0(Species ~ ., data = d)

Building a decision tree from two lists

五迷三道 提交于 2019-12-11 16:45:05
问题 I'm trying to build this decision tree through two lists that I have. input: records= ['dead','healthy','cold','influenza'] symptoms= ['cough','sneezing','fever'] (doesn't always have to be this exact lists can be different lengths etc..) the records list represent the leafs in the tree output: cough Yes / \ No sneezing sneezing Yes / \ No Yes / \ No fever fever fever fever Yes / \ No Yes/ \No Yes / \ No Yes/ \No dead cold influenza cold dead influenza cold healthy my code: def buildtree

Finding contribution by each feature into making particular prediction by h2o ensemble model

て烟熏妆下的殇ゞ 提交于 2019-12-11 16:41:54
问题 I am trying to explain the decision taken by h2o GBM model. based on idea:https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 I want to calculate the contribution by each feature into making a certain decision at test time. Is it possible to get each individual tree from the ensable along with the log-odds at every node? also be needing the path traverse for each tree by model while making the prediction. 回答1: H2O doesn't have an equivalent

i cant visualize Decision Tree for my id3 classifier in weka . what should i do?

混江龙づ霸主 提交于 2019-12-11 14:28:59
问题 i cant visualize Decision Tree for my id3 classifier in weka .for mushroom.arff i have preprocessed data by deleting attributes containing empty instances and after i have applied id3 classifier but unable to visulize tree. 回答1: According to Technobium.com For the moment, the platform does not allow the visualization of the ID3 generated trees. 来源: https://stackoverflow.com/questions/50099119/i-cant-visualize-decision-tree-for-my-id3-classifier-in-weka-what-should-i-do