random-forest

Random forest bootstrap training and forest generation

隐身守侯 提交于 2019-12-25 09:17:02
问题 I have a huge training data for random forest (dim: 47600811*9). I want to take multiple (let's say 1000) bootstrapped sample of dimension 10000*9 (taking 9000 negative class and 1000 positive class datapoints in each run) and iteratively generate trees for all of them and then combine all those trees into 1 forest. A rough idea of required code is given below. Can anbody guide me how can I generate random sample with replacement from my actual trainData and optimally generate trees for them

Multiclass Decision Forest vs Random Forest

◇◆丶佛笑我妖孽 提交于 2019-12-25 04:12:24
问题 How does Multiclass Decision Forest differ from Random Forest? What factors do they have in common? It appears there is not a clear answer on the web regarding this matter. 回答1: Random forests or random decision forests is an extension of the decision forests (ensemble of decision trees) combining bagging and random selection of features to construct a collection of decision trees with controlled variance. A very good paper from Microsoft research you may consider to look at. 来源: https:/

RandomForestRegressor model evaluation?

僤鯓⒐⒋嵵緔 提交于 2019-12-25 02:47:15
问题 I am new to Machine-learning and trying to understand the correct and suitable evaluation for RandomForestRegressor. I have mentioned below Regression metrics and understood these concepts. I am not sure that Which metrics I can use the for RandomForestRegressor's evaluation. Can I use r2_score all the time after prediction ? I am using sklearn packages. Regression metrics See the Regression metrics section of the user guide for further details. metrics.explained_variance_score(y_true, y_pred

How to label special cases in RandomForestRegressor in sklearn in python

岁酱吖の 提交于 2019-12-25 00:15:20
问题 I have a set of numerical features (f1, f2, f3, f4, f5) as follows for each user in my dataset. f1 f2 f3 f4 f5 user1 0.1 1.1 0 1.7 1 user2 1.1 0.3 1 1.3 3 user3 0.8 0.3 0 1.1 2 user4 1.5 1.2 1 0.8 3 user5 1.6 1.3 3 0.3 0 My target output is a prioritised user list. i.e. as shown in the example below. f1 f2 f3 f4 f5 target_priority user1 0.1 1.1 0 1.7 1 2 user2 1.1 0.3 1 1.3 3 1 user3 0.8 0.3 0 1.1 2 5 user4 1.5 1.2 1 0.8 3 3 user5 1.6 1.3 3 0.3 0 4 I want to use these features in a way that

Grid search on parameters inside the parameters of a BaggingClassifier

别说谁变了你拦得住时间么 提交于 2019-12-24 20:32:13
问题 This is a follow up on a question answered here, but I believe it deserves its own thread. In the previous question, we were dealing with “an Ensemble of Ensemble classifiers, where each has its own parameters.” Let's start with the example provided by MaximeKan in his answer: my_est = BaggingClassifier(RandomForestClassifier(n_estimators = 100, bootstrap = True, max_features = 0.5), n_estimators = 5, bootstrap_features = False, bootstrap = False, max_features = 1.0, max_samples = 0.6 ) Now

Create a TensorFlow graph with multiple random forest (RandomForestGraphs)

a 夏天 提交于 2019-12-24 18:50:14
问题 Is it possible to create a graph in TensorFlow containing multiple RandomForestGraphs? Instead of one random forest with num_classes=3 I want to have three random forests, one classifying only classes 1 and 2, the second one classes 2 and 3 and the third classes 3 and 1. In front of these classifiers is a arbitrator element deciding which forest to train or infer based on the current class (i.e., class 1 -> tree 1, class 2 -> tree 2, ...). This way i hope to restrict the possible outcomes in

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

坚强是说给别人听的谎言 提交于 2019-12-24 17:38:10
问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

扶醉桌前 提交于 2019-12-24 17:38:05
问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

Handling unseen categorical variables and MaxBins calculation in Spark Multiclass-classification

末鹿安然 提交于 2019-12-24 17:24:42
问题 Below is the code I have for a RandomForest multiclass-classification model. I am reading from a CSV file and doing various transformations as seen in the code. I am calculating the max number of categories and then giving it as a parameter to RF. This takes a lot of time! Is there a parameter to set or an easier way to make the model automatically infer the max categories?Since it can go more than 1000 and I cannot omit them. How do I handle unseen labels on new data for prediction since

Confusing probabilities from scikit-learn randomforest

末鹿安然 提交于 2019-12-24 11:58:00
问题 I have a time series of integer values which I'm trying to predict. I do this by a sliding window where it learns to associate 99 values to predict the next one. The values are between 0 and 128. The representation for X is a cube of n sliding windows of 99 long and each integer encoded to a one hot encoded vector of 128 elements long. The shape of this array is (n, 99, 128). The shape of Y is (n, 128). I see it as a multi-class problem as Y can take precisely one outcome. This works fine