classification | 易学教程

Unbalanced classification using RandomForestClassifier in sklearn

阅读更多关于 Unbalanced classification using RandomForestClassifier in sklearn

问题 I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the rebalance weights accordingly in sklearn with Random Forest, kind of like in the following link: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#balance 回答1: You can pass sample weights argument to Random Forest fit method sample_weight : array-like, shape = [n_samples] or None Sample

Cost function in logistic regression gives NaN as a result

阅读更多关于 Cost function in logistic regression gives NaN as a result

问题 I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function: t = 1 ./ (1 + exp(-z)); where z = x*theta And I am using the following cost function to calculate cost, to determine when to stop training. function cost = computeCost(x, y, theta) htheta = sigmoid(x*theta); cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));

Multi-class classification in libsvm [closed]

阅读更多关于 Multi-class classification in libsvm [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I'm working with libsvm and I must implement the classification for multiclasses with one versus all . How can I do it? Does libsvm version 2011 use this? I think that my question is not very clear. if libsvm don

How to create Training data for Text classification on 4 categories

阅读更多关于 How to create Training data for Text classification on 4 categories

问题 My machine learning goal is to search for potential risks (will cost more money) and opportunities (will save money) from a Project Requirements document. My idea is to classify sentences from the data into one of these categories: Risk, Opportunity and Irrelevant (no risk, no opportunity, default categorie). I will use a multinomial Bayes classifier for this with tf-dif. Now I need to have data for my training set and test set. The way I will do this is label every sentence from requirement

Why should we compute the image mean when we train CNNs?

阅读更多关于 Why should we compute the image mean when we train CNNs?

问题 When I use caffe for image classification, it often computes the image mean. Why is that the case? Someone said that it can improve the accuracy, but I don't understand why this should be the case. 回答1: Neural networks (including CNNs) are models with thousands of parameters which we try to optimize with gradient descent. Those models are able to fit a lot of different functions by having a non-linearity φ at their nodes. Without a non-linear activation function, the network collapses to a

Why should we compute the image mean when we train CNNs?

阅读更多关于 Why should we compute the image mean when we train CNNs?

R Caret Package error imputing data with Pre-Process function

阅读更多关于 R Caret Package error imputing data with Pre-Process function

问题 I have a dataset (training - testing) with missing data and I would like to impute data before the classification. I tried using the caret package and the function preProcess, I want to impute data using the predictor variable for the training set and impute data on the testing set only using the knowledge of the trainingset without using the predictor of the testing set (that I should not know). p = preProcess(x = training, method = "knnImpute", k = 10) pred = predict(object = p, newdata =

caffe: Confused about regression

阅读更多关于 caffe: Confused about regression

问题 I have a really weird problem I want to explain to you. I am not sure if this is a topic for SO but I hope it will be in the end. My general problem task is depth estimation, i.e. I have an image as input and its corresponding ground_truth (depth image). Then I have my net (which should be considered as black box) and my last layers. First of all depth estimation is rather a regression task than a classification task. Therefore I decided to use a EuclideanLoss layer where my num_output of my

Multiclass Decision Forest vs Random Forest

阅读更多关于 Multiclass Decision Forest vs Random Forest

问题 How does Multiclass Decision Forest differ from Random Forest? What factors do they have in common? It appears there is not a clear answer on the web regarding this matter. 回答1: Random forests or random decision forests is an extension of the decision forests (ensemble of decision trees) combining bagging and random selection of features to construct a collection of decision trees with controlled variance. A very good paper from Microsoft research you may consider to look at. 来源： https:/

how to force scikit-learn DictVectorizer not to discard features?

阅读更多关于 how to force scikit-learn DictVectorizer not to discard features?

问题 Im trying to use scikit-learn for a classification task. My code extracts features from the data, and stores them in a dictionary like so: feature_dict['feature_name_1'] = feature_1 feature_dict['feature_name_2'] = feature_2 when I split the data in order to test it using sklearn.cross_validation everything works as it should. The problem Im having is when the test data is a new set, not part of the learning set (although it has the same exact features for each sample). after I fit the