random-forest | 易学教程

Random Forest Black Box with CleverHans

阅读更多关于 Random Forest Black Box with CleverHans

I am new to this stuff and trying to attack Random Forest with Black Box FGSM (from clever hans) But I'm not sure how to implement it. They've a blackbox example for Mnist data but I dont understand where should I put my random forest and where should I attack. Any help would be appreciated. In the current tutorial, the black-box model is a neural network implemented with TensorFlow and its predictions (the labels) are used to train a substitute model (a copy of the black-box model). The substitute model is then used to craft adversarial examples that transfer to the black-box model. In your

Bizarre Behavior of randomForest Package When Dropping One Prediction Class

阅读更多关于 Bizarre Behavior of randomForest Package When Dropping One Prediction Class

问题 I am running a random forest model that produces results that make absolutely no sense to me from a statistical perspective, and thus I'm convinced that something must be going wrong code-wise with the randomForest package. The predicted / left hand side variable is, in at least this iteration of the model, a party ID with 3 possible outcomes: Democrat, Independent, Republican. I run the model, get results, fine. I'm at this point not super concerned with the results per se, but rather what

Tagging columns as Categorical in Spark

阅读更多关于 Tagging columns as Categorical in Spark

I am currently using StringIndexer to convert lot of columns into unique integers for classification in RandomForestModel. I am also using a pipeline for the ML process. Some queries are How does the RandomForestModel know which columns are categorical. StringIndexer converts non--numerical to numerical but does it add some meta-data of somesort to indicate that it is a categorical column? In mllib.tree.RF there was parameter call categoricalInfo which indicated columns which are categorical. How does ml.tree.RF know which are since that is not present. Also, StringIndexer maps categories to

Handling categorical features using scikit-learn

阅读更多关于 Handling categorical features using scikit-learn

What am I doing? I am solving a classification problem using Random Forests. I have a set of strings of a fixed length (10 characters long) that represent DNA sequences. DNA alphabet consists of 4 letters, namely A , C , G , T . Here's a sample of my raw data: ATGCTACTGA ACGTACTGAT AGCTATTGTA CGTGACTAGT TGACTATGAT Each DNA sequence comes with experimental data describing a real biological response; the molecule was seen to elicit biological response (1), or not (0). Problem: The training set consists of both, categorical (nominal) and numerical features. It is of the following structure:

Spark RandomForest training StackOverflow error

阅读更多关于 Spark RandomForest training StackOverflow error

I am running a training of my model and I am getting the StackOverflow error whenever I increase the maxDepth over 12. Everything works correctly for 5,10,11. I am using spark 2.0.2 (and i cannot upgrade it for next couple of weeks). I have > 3M data, 200 features, 2500 trees and I would like to improve the accuracy by increasing the max depth. Is there a way to overcome this problem? Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 92 in stage 789.0 failed 4 times, most recent failure: Lost task 92.3 in stage 789.0 (TID 66903, 10.0.0.11): java

How to handle categorical features for Decision Tree, Random Forest in spark ml?

阅读更多关于 How to handle categorical features for Decision Tree, Random Forest in spark ml?

问题 I am trying to build decision tree and random forest classifier on the UCI bank marketing data -> https://archive.ics.uci.edu/ml/datasets/bank+marketing. There are many categorical features (having string values) in the data set. In the spark ml document, it's mentioned that the categorical variables can be converted to numeric by indexing using either StringIndexer or VectorIndexer. I chose to use StringIndexer (vector index requires vector feature and vector assembler which convert features

Extract and Visualize Model Trees from Sparklyr

阅读更多关于 Extract and Visualize Model Trees from Sparklyr

问题 Does anyone have any advice about how to convert the tree information from sparklyr's ml_decision_tree_classifier, ml_gbt_classifier, or ml_random_forest_classifier models into a.) a format that can be understood by other R tree-related libraries and (ultimately) b.) a visualization of the trees for non-technical consumption? This would include the ability to convert back to the actual feature names from the substituted string indexing values that are produced during the vector assembler. The

Neural Network - Working with a imbalanced dataset

阅读更多关于 Neural Network - Working with a imbalanced dataset

I am working on a Classification problem with 2 labels : 0 and 1. My training dataset is a very imbalanced dataset (and so will be the test set considering my problem). The proportion of the imbalanced dataset is 1000:4 , with label '0' appearing 250 times more than label '1'. However, I have a lot of training samples : around 23 millions. So I should get around 100 000 samples for the label '1'. Considering the big number of training samples I have, I didn't consider SVM. I also read about SMOTE for Random Forests. However, I was wondering whether NN could be efficient to handle this kind of

something similar to permutation accuracy importance in h2o package

阅读更多关于 something similar to permutation accuracy importance in h2o package

I fitted a random forest for my multinomial target with the randomForest package in R. Looking for the variable importance I found out permutation accuracy importance which is what I was looking for my analysis. I fitted a random forest with the h2o package too, but the only measures it shows me are relative_importance, scaled_importance, percentage . My question is: can I extract a measure that shows me the level of the target which better classify the variable i want to take in exam? Permutation accuracy importance is the best measure I can use in this case? For example: I have a 3 levels

Python vectorization for classification [duplicate]

阅读更多关于 Python vectorization for classification [duplicate]

This question already has an answer here: Scikit learn - fit_transform on the test set 1 answer I am currently trying to build a text classification model (document classification) with roughly 80 classes. When I build and train the model using random forest (after vectorizing the text into a TF-IDF matrix), the model works well. However, when I introduce new data, the same words that I used to build my RF aren't necessarily identical to the training set. This is a problem because I have a different number of features in my training set than I do in my test set (so the dimensions for the