classification | 易学教程

Does the training set and testing set have to be different from the predicting set?

阅读更多关于 Does the training set and testing set have to be different from the predicting set?

问题 I know the general rule that we should test a trained classifier only on the testing set. But now comes the question: When I have an already trained and tested classifier ready, can I apply it to the same dataset that was the base of the training and testing set? Or do I have to apply it to a new predicting set that is different from the training+testing set? And what if I predict a label column of a time series (edited later: I do not mean to create a classical time series analysis here, but

Classification accuracy is too low (Word2Vec)

阅读更多关于 Classification accuracy is too low (Word2Vec)

问题 i'm working on an Multi-Label Emotion Classification problem to be solved by word2vec. this is my code that i've learned from a couple of tutorials. now the accuracy is very low. about 0.02 which is telling me something is wrong in my code. but i cannot find it. i tried this code for TF-IDF and BOW (obviously except word2vec part) and i got much better accuracy scores such as 0.28, but it seems this one is somehow wrong: np.set_printoptions(threshold=sys.maxsize) wv = gensim.models

Optimize F-score in e1071 package

阅读更多关于 Optimize F-score in e1071 package

问题 I'm trying to implement a one class SVM using the e1071 package in R. Can somebody give me pointers on how to optimize the F-score using a grid search ? I have tried the tune.svm functions but it has only resulted in high sensitivity or high Specificity. The percentage of positive class which I'm trying to predict is about 1-2% in the general population. The results i get have high accuracy but with a very low F-score: Reference Prediction members Not members members 1 4 Not members 12 983

Scaling production data

阅读更多关于 Scaling production data

问题 I have a dataset, say Data, which consists of categorical and numerical variables. After cleaning them, I have scaled only the numerical variables (guess catgorical must not be scaled) using Data <- Data %>% dplyr::mutate_if(is.numeric, ~scale(.) %>% as.vector) I then split it randomly to 70-30 percentage using set.seed(123) sample_size = floor(0.70*nrow(Data)) xyz <- sample(seq_len(nrow(Data)),size = sample_size) Train_Set <- Join[xyz,] Test_Set <- Join[-xyz,] I have built a classification

Error with Spring batch Classifier Composite Item Writer

阅读更多关于 Error with Spring batch Classifier Composite Item Writer

问题 I have to basically produce multiple xml files for each file_id per currency ( ie. usd,zar ect) these transactions are all in 1 DB table. Do I create a composite writer for each currency and on my Item Processor I filter for each different currency that I read from the DB. or Can I use multiple steps for each currency per file_id ? I have been struggling to find a Springbatch solution around this. The filename resource will be different for each file and currency. For example I can recieve

Error with Spring batch Classifier Composite Item Writer

阅读更多关于 Error with Spring batch Classifier Composite Item Writer

Error with Spring batch Classifier Composite Item Writer

阅读更多关于 Error with Spring batch Classifier Composite Item Writer

How to perform multiclass multioutput classification using lstm

阅读更多关于 How to perform multiclass multioutput classification using lstm

问题 I have multiclass multioutput classification (see https://scikit-learn.org/stable/modules/multiclass.html for details). In other words, my dataset looks as follows. node_name, timeseries_1, timeseries_2, label_1, label_2 node1, [1.2, ...], [1.8, ...], 0, 2 node2, [1.0, ...], [1.1, ...], 1, 1 node3, [1.9, ...], [1.2, ...], 0, 3 ... ... ... So, my label_1 could be either 0 or 1 , whereas my label_2 could be either 0 , 1 , or 2 . My current code is as follows. def create_network(): model =

Time series classification - Preparing data

阅读更多关于 Time series classification - Preparing data

问题 Looking for help on preparing input data for time series classification. The data is from a bunch of users who need to be classified. I want to Use LSTMs(plan to implement via Keras, with Tenserflow backend). I have data in two formats. Which is the right way to feed to RNNs for classification? Any help regrading the input shape would be of great help. Format 1 UserID TimeStamp Duration Label 1 2020:03:01:00:00 10 0 1 2020:03:01:01:00 0 0 1 2020:03:01:02:00 100 0 1 2020:03:01:03:00 15 0 1

Time series classification - Preparing data

阅读更多关于 Time series classification - Preparing data