classification

Over-Sampling Class Imbalance Train/Test Split “Found input variables with inconsistent numbers of samples” Solution?

旧城冷巷雨未停 提交于 2020-01-06 02:25:46
问题 Trying to follow this article to perform over-sampling for imbalanced classification. My class ratio is about 8:1. https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets/notebook I am confused on the pipeline + coding structure. Should you over-sample after train/test splitting? If so, how do you deal with the fact that the target label is dropped from X? I tried keeping it and then performed the over-sampling then dropped labels on X_train/X_test and replaced the new

How to Classify data frame Based on a Columns in R? [duplicate]

只谈情不闲聊 提交于 2020-01-05 08:29:52
问题 This question already has answers here : Assign unique ID based on two columns [duplicate] (2 answers) Closed 4 months ago . I have a data frame and has columns like this: gene col1 col2 type ------------------------------ gene_1 a b 1 gene_2 aa bb 2 gene_3 a b 1 gene_4 aa bb 2 I want to find the column "type" using column "col2" and "col1". so I need a classification based on "col2" and "col1". how should I do this in R? thanks a lot 回答1: Based. on the output, an option is to create group

How to Classify data frame Based on a Columns in R? [duplicate]

丶灬走出姿态 提交于 2020-01-05 08:29:41
问题 This question already has answers here : Assign unique ID based on two columns [duplicate] (2 answers) Closed 4 months ago . I have a data frame and has columns like this: gene col1 col2 type ------------------------------ gene_1 a b 1 gene_2 aa bb 2 gene_3 a b 1 gene_4 aa bb 2 I want to find the column "type" using column "col2" and "col1". so I need a classification based on "col2" and "col1". how should I do this in R? thanks a lot 回答1: Based. on the output, an option is to create group

Object recognition vs detection vs classification? What's the difference?

旧街凉风 提交于 2020-01-05 05:23:08
问题 I don't know if this the right stackexchange forum where to ask this question, please let me know if this is not the case. I'm developing an application which given an input image containing a painting as input, it is able to tell you the title of the painting. An analogous case is: given an input image containing a building, the returned result is the name of the building. What kind of application is this? On first impact, I would say something like "image classification". I'm not an expert

Training a classifier using images of different dimensions but same number of HoG features

元气小坏坏 提交于 2020-01-04 14:04:06
问题 I want to train my classifier with some images, some of which have different dimensions. They all fall under the following dimensions: 100x50 50x100 64x72 72x64 However, with 9 orientation bins, and 8 pixels per cell, each of these generates 648 HoG features. I actually chose all images to be of one of these sizes so that they would end up having the same number of HoG features so that training is uniform. The reason I opted for this is because the object of interest in the training images

How to perform SMOTE with cross validation in sklearn in python

孤人 提交于 2020-01-04 08:03:25
问题 I have a highly imbalanced dataset and would like to perform SMOTE to balance the dataset and perfrom cross validation to measure the accuracy. However, most of the existing tutorials make use of only single training and testing iteration to perfrom SMOTE. Therefore, I would like to know the correct procedure to perfrom SMOTE using cross-validation. My current code is as follows. However, as mentioned above it only uses single iteration. from imblearn.over_sampling import SMOTE from sklearn

Bulding a classification model in R studio with keras

痴心易碎 提交于 2020-01-04 02:34:19
问题 I am trying to build a classification model through keras tensor flow in R stdio but I am getting an error below. Pls does anyone have a clue? this is my first time using keras or deep learning. Thanks > set.seed(10) > ind <- sample(2, nrow(stdk), replace=TRUE, prob=c(0.80, 0.2)) > stdk.train <- stdk[ind==1, ] > stdk.test <- stdk[ind==2, ] > change.train <- stdk[ind==1, 5] > change.test <- stdk[ind==2, 5] > stdk.trainLabels <- to_categorical(change.train) > stdk.testLabels <- to_categorical

How to use StringToWordVector (weka) in java?

核能气质少年 提交于 2020-01-03 20:04:58
问题 This is my arff file @relation hamspam @attribute text string @attribute class {ham,spam} @data 'good',ham 'very good',ham 'bad',spam 'very bad',spam 'very bad, very bad',spam What i want to do is to classify it with weka clasiffier in my java program, but i don't know how to use StringToWordVector and then classify it. this my code: Classifier j48tree = new J48(); Instances train = new Instances(new BufferedReader(new FileReader("data.arff"))); StringToWordVector filter = new

How can I know training data is enough for machine learning

风流意气都作罢 提交于 2020-01-03 08:53:23
问题 For example: If I want to train a classifier (maybe SVM), how many sample do I need to collect? Is there a measure method for this? 回答1: It is not easy to know how many samples you need to collect. However you can follow these steps: For solving a typical ML problem: Build a dataset a with a few samples, how many? it will depend on the kind of problem you have, don't spend a lot of time now. Split your dataset into train, cross, test and build your model. Now that you've built the ML model,

How can I know training data is enough for machine learning

纵然是瞬间 提交于 2020-01-03 08:52:35
问题 For example: If I want to train a classifier (maybe SVM), how many sample do I need to collect? Is there a measure method for this? 回答1: It is not easy to know how many samples you need to collect. However you can follow these steps: For solving a typical ML problem: Build a dataset a with a few samples, how many? it will depend on the kind of problem you have, don't spend a lot of time now. Split your dataset into train, cross, test and build your model. Now that you've built the ML model,