decision-tree

Weighted Decision Trees using Entropy

孤街醉人 提交于 2019-12-03 13:01:41
I'm building a binary classification tree using mutual information gain as the splitting function. But since the training data is skewed toward a few classes, it is advisable to weight each training example by the inverse class frequency. How do I weight the training data? When calculating the probabilities to estimate the entropy, do I take weighted averages? EDIT: I'd like an expression for entropy with the weights. State-value weighted entropy as a measure of investment risk. http://www56.homepage.villanova.edu/david.nawrocki/State%20Weighted%20Entropy%20Nawrocki%20Harding.pdf Robert Harvey

Python, PyDot and DecisionTree

余生颓废 提交于 2019-12-03 12:25:56
问题 I'm trying to visualize my DecisionTree, but getting the error The code is: X = [i[1:] for i in dataset]#attribute y = [i[0] for i in dataset] clf = tree.DecisionTreeClassifier() dot_data = StringIO() tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data) graph = pydot.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("tree.pdf") And the error is Traceback (most recent call last): if data.startswith(codecs.BOM_UTF8): TypeError: startswith first arg must be str or a tuple of

scikit learn - feature importance calculation in decision trees

一曲冷凌霜 提交于 2019-12-03 09:27:23
问题 I'm trying to understand how feature importance is calculated for decision trees in sci-kit learn. This question has been asked before, but I am unable to reproduce the results the algorithm is providing. For example: from StringIO import StringIO from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier from sklearn.tree.export import export_graphviz from sklearn.feature_selection import mutual_info_classif X = [[1,0,0], [0,0,0], [0,0,1], [0,1,0]] y = [1,0,1,1]

Multivariate Decision Tree learner

五迷三道 提交于 2019-12-03 08:56:35
A lot univariate decision tree learner implementations (C4.5 etc) do exist, but does actually someone know multivariate decision tree learner algorithms? Bennett and Blue's A Support Vector Machine Approach to Decision Trees does multivariate splits by using embedded SVMs for each decision in the tree. Similarly, in Multicategory classification via discrete support vector machines (2009) , Orsenigo and Vercellis embed a multicategory variant of discrete support vector machines (DSVM) into the decision tree nodes. CART algorithm for decisions tree can be made into a Multivariate. CART is a

Decision trees and rule engines (Drools)

不羁的心 提交于 2019-12-03 06:24:49
In the application that I'm working on right now, I need to periodically check eligibility of tens of thousands of objects for some kind of a service. The decision diagram itself is in the following form, just way larger: In each of the end nodes (circles), I need to run an action (change an object's field, log information etc). I tried using Drool Expert framework, but in that case I'd need to write a long rule for every path in the diagram leading to an end node. Drools Flow doesn't seem to be built for such a use case either - I take an object and then, depending on the decisions along the

TicTacToe AI Making Incorrect Decisions

烈酒焚心 提交于 2019-12-03 06:14:49
A little background: as a way to learn multinode trees in C++, I decided to generate all possible TicTacToe boards and store them in a tree such that the branch beginning at a node are all boards that can follow from that node, and the children of a node are boards that follow in one move. After that, I thought it would be fun to write an AI to play TicTacToe using that tree as a decision tree. TTT is a solvable problem where a perfect player will never lose, so it seemed an easy AI to code for my first time trying an AI. Now when I first implemented the AI, I went back and added two fields to

How to implement decision tree with c# (visual studio 2008) - Help

时光总嘲笑我的痴心妄想 提交于 2019-12-03 04:22:51
问题 I have a decision tree that i need to turn to a code in C# The simple way of doing it is using if-else statements but in this solution i will need to create 4-5 nested conditions. I am looking for a better way to do it and so far i read a little bit about rule engines. Do you have something else to suggest for an efficient way to develop decision tree with 4-5 nested conditions? 回答1: I implemented a simple decision tree as a sample in my book. The code is available online here, so perhaps you

Python, PyDot and DecisionTree

拈花ヽ惹草 提交于 2019-12-03 03:36:35
I'm trying to visualize my DecisionTree, but getting the error The code is: X = [i[1:] for i in dataset]#attribute y = [i[0] for i in dataset] clf = tree.DecisionTreeClassifier() dot_data = StringIO() tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data) graph = pydot.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("tree.pdf") And the error is Traceback (most recent call last): if data.startswith(codecs.BOM_UTF8): TypeError: startswith first arg must be str or a tuple of str, not bytes Can anyone explain me whats the problem? Thank you a lot! I had the same exact problem

Decision tree vs. Naive Bayes classifier [closed]

淺唱寂寞╮ 提交于 2019-12-03 01:34:24
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I am doing some research about different data mining techniques and came across something that I could not figure out. If any one have any idea that would be great. In which cases is it better to use a Decision tree and other cases a Naive Bayes classifier? Why use one of them in certain cases? And the other in

Need guidance towards evaluative boolean logic tree

拟墨画扇 提交于 2019-12-03 00:14:43
I can't seem to find a pointer in the right direction, I am not even sure what the terms are that I should be researching but countless hours of googling seem to be spinning me in circles, so hopefully the collective hive of intelligence of Stack Overflow can help. The problem is this, I need a way to filter data in what I can only call a compound logic tree. Currently the system implements a simple AND filtering system. For example, lets say we have a dataset of people. You add a bunch of filters such that show all the people where (Sex = Female) AND (Age > 23) AND (Age < 30) AND ( Status =