classification

Does the training set and testing set have to be different from the predicting set?

主宰稳场 提交于 2020-06-29 04:01:05
问题 I know the general rule that we should test a trained classifier only on the testing set. But now comes the question: When I have an already trained and tested classifier ready, can I apply it to the same dataset that was the base of the training and testing set? Or do I have to apply it to a new predicting set that is different from the training+testing set? And what if I predict a label column of a time series (edited later: I do not mean to create a classical time series analysis here, but

Classification accuracy is too low (Word2Vec)

久未见 提交于 2020-06-29 03:37:06
问题 i'm working on an Multi-Label Emotion Classification problem to be solved by word2vec. this is my code that i've learned from a couple of tutorials. now the accuracy is very low. about 0.02 which is telling me something is wrong in my code. but i cannot find it. i tried this code for TF-IDF and BOW (obviously except word2vec part) and i got much better accuracy scores such as 0.28, but it seems this one is somehow wrong: np.set_printoptions(threshold=sys.maxsize) wv = gensim.models

Optimize F-score in e1071 package

…衆ロ難τιáo~ 提交于 2020-06-27 23:03:13
问题 I'm trying to implement a one class SVM using the e1071 package in R. Can somebody give me pointers on how to optimize the F-score using a grid search ? I have tried the tune.svm functions but it has only resulted in high sensitivity or high Specificity. The percentage of positive class which I'm trying to predict is about 1-2% in the general population. The results i get have high accuracy but with a very low F-score: Reference Prediction members Not members members 1 4 Not members 12 983

Scaling production data

て烟熏妆下的殇ゞ 提交于 2020-06-26 12:51:22
问题 I have a dataset, say Data, which consists of categorical and numerical variables. After cleaning them, I have scaled only the numerical variables (guess catgorical must not be scaled) using Data <- Data %>% dplyr::mutate_if(is.numeric, ~scale(.) %>% as.vector) I then split it randomly to 70-30 percentage using set.seed(123) sample_size = floor(0.70*nrow(Data)) xyz <- sample(seq_len(nrow(Data)),size = sample_size) Train_Set <- Join[xyz,] Test_Set <- Join[-xyz,] I have built a classification

Error with Spring batch Classifier Composite Item Writer

三世轮回 提交于 2020-06-17 12:58:59
问题 I have to basically produce multiple xml files for each file_id per currency ( ie. usd,zar ect) these transactions are all in 1 DB table. Do I create a composite writer for each currency and on my Item Processor I filter for each different currency that I read from the DB. or Can I use multiple steps for each currency per file_id ? I have been struggling to find a Springbatch solution around this. The filename resource will be different for each file and currency. For example I can recieve

Error with Spring batch Classifier Composite Item Writer

落花浮王杯 提交于 2020-06-17 12:58:34
问题 I have to basically produce multiple xml files for each file_id per currency ( ie. usd,zar ect) these transactions are all in 1 DB table. Do I create a composite writer for each currency and on my Item Processor I filter for each different currency that I read from the DB. or Can I use multiple steps for each currency per file_id ? I have been struggling to find a Springbatch solution around this. The filename resource will be different for each file and currency. For example I can recieve

Error with Spring batch Classifier Composite Item Writer

ぃ、小莉子 提交于 2020-06-17 12:58:23
问题 I have to basically produce multiple xml files for each file_id per currency ( ie. usd,zar ect) these transactions are all in 1 DB table. Do I create a composite writer for each currency and on my Item Processor I filter for each different currency that I read from the DB. or Can I use multiple steps for each currency per file_id ? I have been struggling to find a Springbatch solution around this. The filename resource will be different for each file and currency. For example I can recieve

How to perform multiclass multioutput classification using lstm

﹥>﹥吖頭↗ 提交于 2020-06-15 07:09:08
问题 I have multiclass multioutput classification (see https://scikit-learn.org/stable/modules/multiclass.html for details). In other words, my dataset looks as follows. node_name, timeseries_1, timeseries_2, label_1, label_2 node1, [1.2, ...], [1.8, ...], 0, 2 node2, [1.0, ...], [1.1, ...], 1, 1 node3, [1.9, ...], [1.2, ...], 0, 3 ... ... ... So, my label_1 could be either 0 or 1 , whereas my label_2 could be either 0 , 1 , or 2 . My current code is as follows. def create_network(): model =

Time series classification - Preparing data

♀尐吖头ヾ 提交于 2020-06-13 10:46:06
问题 Looking for help on preparing input data for time series classification. The data is from a bunch of users who need to be classified. I want to Use LSTMs(plan to implement via Keras, with Tenserflow backend). I have data in two formats. Which is the right way to feed to RNNs for classification? Any help regrading the input shape would be of great help. Format 1 UserID TimeStamp Duration Label 1 2020:03:01:00:00 10 0 1 2020:03:01:01:00 0 0 1 2020:03:01:02:00 100 0 1 2020:03:01:03:00 15 0 1

Time series classification - Preparing data

徘徊边缘 提交于 2020-06-13 10:44:51
问题 Looking for help on preparing input data for time series classification. The data is from a bunch of users who need to be classified. I want to Use LSTMs(plan to implement via Keras, with Tenserflow backend). I have data in two formats. Which is the right way to feed to RNNs for classification? Any help regrading the input shape would be of great help. Format 1 UserID TimeStamp Duration Label 1 2020:03:01:00:00 10 0 1 2020:03:01:01:00 0 0 1 2020:03:01:02:00 100 0 1 2020:03:01:03:00 15 0 1