classification

Load classified data from CSV to Scikit-Learn for machine learning

最后都变了- 提交于 2020-02-27 03:29:12
问题 I'm learning Scikit-Learn to do some classifying for tweets. I have a csv with tweets on one column, and their class from 0-11 in next column. I went through this tutorial from Scikit-Learn site I think I understand how the actual classifying is done but I don't think I really understood the data format. In tutorial the material was in files in folders where folder names acted as a classification tag. In my case I should load that data from csv file and apparently I need to construct the

100% classifier accuracy when shuffling data rows

二次信任 提交于 2020-02-25 06:23:52
问题 I'm working on the mushroom classification data set (found here :https://www.kaggle.com/uciml/mushroom-classification) I've done some pre-processing on the data (removed redundant attributes, changed categorical data to numerical) and I'm trying to use my data to train classifiers. Whenever I shuffle my data, either manually or by using train_test_split, all of the models which I use (XGB, MLP, LinearSVC, Decision Tree) have 100% accuracy. Whenever I test the models on unshuffled data the

How to calculate class weights for Random forests

青春壹個敷衍的年華 提交于 2020-02-24 12:22:08
问题 I have datasets for 2 classes on which I have to perform binary classification. I chose Random forest as a classifier as it is giving me the best accuracy among other models. Number of datapoints in dataset-1 is 462 and dataset-2 contains 735 datapoints. I have noticed that my data has minor class imbalance so I tried to optimise my training model and retrained my model by providing class weights. I provided following value of class weights. cwt <- c(0.385,0.614) # Class weights ss <- c(300

How to fix RuntimeError “Expected object of scalar type Float but got scalar type Double for argument”?

折月煮酒 提交于 2020-02-19 09:32:07
问题 I'm trying to train a classifier via PyTorch. However, I am experiencing problems with training when I feed the model with training data. I get this error on y_pred = model(X_trainTensor) : RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1' Here are key parts of my code: # Hyper-parameters D_in = 47 # there are 47 parameters I investigate H = 33 D_out = 2 # output should be either 1 or 0 # Format and load the data y = np.array( df['target'] )

Refitting clusters around fixed centroids

雨燕双飞 提交于 2020-02-02 19:35:30
问题 Clustering/classification problem: Used k-means clustering to generate these clusters and centroids: This is the dataset with the added cluster attribute from the initial run: > dput(sampledata) structure(list(Player = structure(1:5, .Label = c("A", "B", "C", "D", "E"), class = "factor"), Metric.1 = c(0.3938961, 0.28062338, 0.32532626, 0.29239642, 0.25622558), Metric.2 = c(0.00763359, 0.01172354, 0.40550867, 0.04026846, 0.05976367), Metric.3 = c(0.50766075, 0.20345662, 0.06267444, 0.08661417,

Is it reasonable for l1/l2 regularization to cause all feature weights to be zero in vowpal wabbit?

|▌冷眼眸甩不掉的悲伤 提交于 2020-02-01 08:28:37
问题 I got a weird result from vw , which uses online learning scheme for logistic regression. And when I add --l1 or --l2 regularization then I got all predictions at 0.5 (that means all features are 0) Here's my command: vw -d training_data.txt --loss_function logistic -f model_l1 --invert_hash model_readable_l1 --l1 0.05 --link logistic ...and here's learning process info: using l1 regularization = 0.05 final_regressor = model_l1 Num weight bits = 18 learning rate = 0.5 initial_t = 0 power_t =

Binary semi-supervised classification with positive only and unlabeled data set

情到浓时终转凉″ 提交于 2020-01-28 06:19:55
问题 My data consist of comments (saved in files) and few of them are labelled as positive. I would like to use semi-supervised and PU classification to classify these comments into positive and negative classes. I would like to know if there is any public implementation for semi-supervised and PU implementations in python (scikit-learn)? 回答1: You could try to train a one-class SVM and see what kind of results that gives you. I haven't heard about the PU paper. I think for all practical purposes

Binary semi-supervised classification with positive only and unlabeled data set

点点圈 提交于 2020-01-28 06:19:47
问题 My data consist of comments (saved in files) and few of them are labelled as positive. I would like to use semi-supervised and PU classification to classify these comments into positive and negative classes. I would like to know if there is any public implementation for semi-supervised and PU implementations in python (scikit-learn)? 回答1: You could try to train a one-class SVM and see what kind of results that gives you. I haven't heard about the PU paper. I think for all practical purposes

Binary semi-supervised classification with positive only and unlabeled data set

依然范特西╮ 提交于 2020-01-28 06:17:06
问题 My data consist of comments (saved in files) and few of them are labelled as positive. I would like to use semi-supervised and PU classification to classify these comments into positive and negative classes. I would like to know if there is any public implementation for semi-supervised and PU implementations in python (scikit-learn)? 回答1: You could try to train a one-class SVM and see what kind of results that gives you. I haven't heard about the PU paper. I think for all practical purposes

SVM classification with always high precision

一曲冷凌霜 提交于 2020-01-25 13:27:30
问题 I have a binary classification problem and I'm trying to get precision-recall curve for my classifier. I use libsvm with RBF kernel and probability estimate option. To get the curve I'm changing decision threshold from 0 to 1 with steps of 0.1. But on every run, I get high precision even if recall decreases with increasing threshold. My false positive rate seems always low compared to true positives. My results are these: Threshold: 0.1 TOTAL TP:393, FP:1, FN: 49 Precision:0.997462, Recall: 0