classification | 易学教程

Multiple classification models in a scikit pipeline python

阅读更多关于 Multiple classification models in a scikit pipeline python

问题 I am solving a binary classification problem over some text documents using Python and implementing the scikit-learn library, and I wish to try different models to compare and contrast results - mainly using a Naive Bayes Classifier, SVM with K-Fold CV, and CV=5 . I am finding a difficulty in combining all of the methods into one pipeline, given that the latter two models use gridSearchCV() . I cannot have multiple Pipelines running during a single implementation due to concurrency issues,

Plotting learning curve in keras gives KeyError: 'val_acc'

阅读更多关于 Plotting learning curve in keras gives KeyError: 'val_acc'

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 3 years ago . I was trying to plot train and test learning curve in keras, however, the following code produces KeyError: 'val_acc error . The official document <https://keras.io/callbacks/> states that in order to use 'val_acc' I need to enable validation and accuracy monitoring which I dont understand and dont know how to use in my code. Any help would be much appreciated. Thanks. seed = 7

Load pickled classifier data : Vocabulary not fitted Error

阅读更多关于 Load pickled classifier data : Vocabulary not fitted Error

问题 I have read all related questions here but couldn't find a working solution : My classifier creation : class StemmedTfidfVectorizer(TfidfVectorizer): def build_analyzer(self): analyzer = super(TfidfVectorizer, self).build_analyzer() return lambda doc: english_stemmer.stemWords(analyzer(doc)) tf = StemmedTfidfVectorizer(analyzer='word', ngram_range=(1,2), min_df = 0, max_features=200000, stop_words = 'english') def create_tfidf(f): docs = [] targets = [] with open(f, "r") as sentences_file:

ValueError: The number of classes has to be greater than one (python)

阅读更多关于 ValueError: The number of classes has to be greater than one (python)

问题 When passing x,y in fit , I am getting the following error: Traceback (most recent call last): File "C:/Classify/classifier.py", line 95, in train_avg, test_avg, cms = train_model(X, y, "ceps", plot=True) File "C:/Classify/classifier.py", line 47, in train_model clf.fit(X_train, y_train) File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 676, in fit raise ValueError("The number of classes has to be greater than" ValueError: The number of classes has to be greater than one. Below

Does sklearn support a cost matrix?

阅读更多关于 Does sklearn support a cost matrix?

问题 Is it possible to train classifiers in sklearn with a cost matrix with different costs for different mistakes? For example in a 2 class problem, the cost matrix would be a 2 by 2 square matrix. For example A_ij = cost of classifying i as j. The main classifier I am using is a Random Forest. Thanks. 回答1: The cost-sensitive framework you describe is not supported in scikit-learn, in any of the classifiers we have. 回答2: One way to circumvent this limitation is to use under or oversampling. E.g.,

Retrieve final hidden activation layer output from sklearn's MLPClassifier

阅读更多关于 Retrieve final hidden activation layer output from sklearn's MLPClassifier

问题 I would like to do some tests with neural network final hidden activation layer outputs using sklearn's MLPClassifier after fit ting the data. for example, If I create a classifier, assuming data X_train with labels y_train and two hidden layers of sizes (300,100) clf = MLPClassifier(hidden_layer_sizes=(300,100)) clf.fit(X_train,y_train) I would like to be able to call a function somehow to retrieve the final hidden activation layer vector of length 100 for use in additional tests. Assuming a

MultiClass using LIBSVM

阅读更多关于 MultiClass using LIBSVM

问题 I have a multiclass svm classification(6 class). I would like to classify it using LIBSVM. The following are the ones that i have tried and i have some questions regarding them. Method1( one vs one): model = svmtrain(TrainLabel, TrainVec, '-c 1 -g 0.00154 -b 0.9'); [predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model); Two questions about this method: 1) is that all i need to do for multiclass problem 2) what value should it be for n in '-b n'. I m not sure Method 2(

Convert predicted probabilities after downsampling to actual probabilities in classification (using mlr)

阅读更多关于 Convert predicted probabilities after downsampling to actual probabilities in classification (using mlr)

问题 If I use undersampling in case of an unbalanced binary target variable to train a model, the prediction method calculates probabilities under the assumption of a balanced data set. How can I convert these probabilities to actual probabilities for the unbalanced data? Is the a conversion argument/function implemented in the mlr package or another package? For example: a <- data.frame(y=factor(sample(0:1, prob = c(0.1,0.9), replace=T, size=100))) a$x <- as.numeric(a$y)+rnorm(n=100, sd=1) task <

UserWarning: Label not :NUMBER: is present in all training examples

阅读更多关于 UserWarning: Label not :NUMBER: is present in all training examples

问题 I am doing multilabel classification, where I try to predict correct labels for each document and here is my code: mlb = MultiLabelBinarizer() X = dataframe['body'].values y = mlb.fit_transform(dataframe['tag'].values) classifier = Pipeline([ ('vectorizer', CountVectorizer(lowercase=True, stop_words='english', max_df = 0.8, min_df = 10)), ('tfidf', TfidfTransformer()), ('clf', OneVsRestClassifier(LinearSVC()))]) predicted = cross_val_predict(classifier, X, y) When running my code I get

What subjects, topics does a computer science graduate need to learn to apply available machine learning frameworks, esp. SVMs

阅读更多关于 What subjects, topics does a computer science graduate need to learn to apply available machine learning frameworks, esp. SVMs

问题 I want to teach myself enough machine learning so that I can, to begin with, understand enough to put to use available open source ML frameworks that will allow me to do things like: Go through the HTML source of pages from a certain site and "understand" which sections form the content, which the advertisements and which form the metadata ( neither the content, nor the ads - for eg. - TOC, author bio etc ) Go through the HTML source of pages from disparate sites and "classify" whether the