classification | 易学教程

Load classified data from CSV to Scikit-Learn for machine learning

阅读更多关于 Load classified data from CSV to Scikit-Learn for machine learning

问题 I'm learning Scikit-Learn to do some classifying for tweets. I have a csv with tweets on one column, and their class from 0-11 in next column. I went through this tutorial from Scikit-Learn site I think I understand how the actual classifying is done but I don't think I really understood the data format. In tutorial the material was in files in folders where folder names acted as a classification tag. In my case I should load that data from csv file and apparently I need to construct the

100% classifier accuracy when shuffling data rows

阅读更多关于 100% classifier accuracy when shuffling data rows

问题 I'm working on the mushroom classification data set (found here :https://www.kaggle.com/uciml/mushroom-classification) I've done some pre-processing on the data (removed redundant attributes, changed categorical data to numerical) and I'm trying to use my data to train classifiers. Whenever I shuffle my data, either manually or by using train_test_split, all of the models which I use (XGB, MLP, LinearSVC, Decision Tree) have 100% accuracy. Whenever I test the models on unshuffled data the

How to calculate class weights for Random forests

阅读更多关于 How to calculate class weights for Random forests

问题 I have datasets for 2 classes on which I have to perform binary classification. I chose Random forest as a classifier as it is giving me the best accuracy among other models. Number of datapoints in dataset-1 is 462 and dataset-2 contains 735 datapoints. I have noticed that my data has minor class imbalance so I tried to optimise my training model and retrained my model by providing class weights. I provided following value of class weights. cwt <- c(0.385,0.614) # Class weights ss <- c(300

How to fix RuntimeError “Expected object of scalar type Float but got scalar type Double for argument”?

阅读更多关于 How to fix RuntimeError “Expected object of scalar type Float but got scalar type Double for argument”?

问题 I'm trying to train a classifier via PyTorch. However, I am experiencing problems with training when I feed the model with training data. I get this error on y_pred = model(X_trainTensor) : RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1' Here are key parts of my code: # Hyper-parameters D_in = 47 # there are 47 parameters I investigate H = 33 D_out = 2 # output should be either 1 or 0 # Format and load the data y = np.array( df['target'] )

Refitting clusters around fixed centroids

阅读更多关于 Refitting clusters around fixed centroids

问题 Clustering/classification problem: Used k-means clustering to generate these clusters and centroids: This is the dataset with the added cluster attribute from the initial run: > dput(sampledata) structure(list(Player = structure(1:5, .Label = c("A", "B", "C", "D", "E"), class = "factor"), Metric.1 = c(0.3938961, 0.28062338, 0.32532626, 0.29239642, 0.25622558), Metric.2 = c(0.00763359, 0.01172354, 0.40550867, 0.04026846, 0.05976367), Metric.3 = c(0.50766075, 0.20345662, 0.06267444, 0.08661417,

Is it reasonable for l1/l2 regularization to cause all feature weights to be zero in vowpal wabbit?

阅读更多关于 Is it reasonable for l1/l2 regularization to cause all feature weights to be zero in vowpal wabbit?

问题 I got a weird result from vw , which uses online learning scheme for logistic regression. And when I add --l1 or --l2 regularization then I got all predictions at 0.5 (that means all features are 0) Here's my command: vw -d training_data.txt --loss_function logistic -f model_l1 --invert_hash model_readable_l1 --l1 0.05 --link logistic ...and here's learning process info: using l1 regularization = 0.05 final_regressor = model_l1 Num weight bits = 18 learning rate = 0.5 initial_t = 0 power_t =

Binary semi-supervised classification with positive only and unlabeled data set

阅读更多关于 Binary semi-supervised classification with positive only and unlabeled data set

问题 My data consist of comments (saved in files) and few of them are labelled as positive. I would like to use semi-supervised and PU classification to classify these comments into positive and negative classes. I would like to know if there is any public implementation for semi-supervised and PU implementations in python (scikit-learn)? 回答1: You could try to train a one-class SVM and see what kind of results that gives you. I haven't heard about the PU paper. I think for all practical purposes

Binary semi-supervised classification with positive only and unlabeled data set

阅读更多关于 Binary semi-supervised classification with positive only and unlabeled data set

Binary semi-supervised classification with positive only and unlabeled data set

阅读更多关于 Binary semi-supervised classification with positive only and unlabeled data set

SVM classification with always high precision

阅读更多关于 SVM classification with always high precision

问题 I have a binary classification problem and I'm trying to get precision-recall curve for my classifier. I use libsvm with RBF kernel and probability estimate option. To get the curve I'm changing decision threshold from 0 to 1 with steps of 0.1. But on every run, I get high precision even if recall decreases with increasing threshold. My false positive rate seems always low compared to true positives. My results are these: Threshold: 0.1 TOTAL TP:393, FP:1, FN: 49 Precision:0.997462, Recall: 0