Python : How to find Accuracy Result in SVM Text Classifier Algorithm for Multilabel Class

心已入冬 提交于 2019-12-31 22:26:29

问题


I have used following set of code: And I need to check accuracy of X_train and X_test

The following code works for me in my classification problem over multi-labeled class

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier

X_train = np.array(["new york is a hell of a town",
                    "new york was originally dutch",
                    "the big apple is great",
                    "new york is also called the big apple",
                    "nyc is nice",
                    "people abbreviate new york city as nyc",
                    "the capital of great britain is london",
                    "london is in the uk",
                    "london is in england",
                    "london is in great britain",
                    "it rains a lot in london",
                    "london hosts the british museum",
                    "new york is great and so is london",
                    "i like london better than new york"])
y_train = [[0],[0],[0],[0]
            ,[0],[0],[1],[1]
            ,[1],[1],[1],[1]
            ,[2],[2]]
X_test = np.array(['nice day in nyc',
                   'the capital of great britain is london',
                   'i like london better than new york',
                   ])   
target_names = ['Class 1', 'Class 2','Class 3']

classifier = Pipeline([
    ('vectorizer', CountVectorizer(min_df=1,max_df=2)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))

OUTPUT

nice day in nyc => Class 1
the capital of great britain is london => Class 2
i like london better than new york => Class 3

I would like to check the accuracy between Training and Test Dataset. Score Function doesn't work for me, it shows an error stating that multilabel value can't accepted

>>> classifier.score(X_train, X_test)

NotImplementedError: score is not supported for multilabel classifiers

Kindly help me get accuracy results for training and test data and choose an algorithm for our classification case.


回答1:


If you want to get an accuracy score for your test set, you'll need to create an answer key, which you can call y_test. You can't know if your predictions are correct unless you know the correct answers.

Once you have an answer key, you can get the accuracy. The method you want is sklearn.metrics.accuracy_score.

I've written it out below:

from sklearn.metrics import accuracy_score

# ... everything else the same ...

# create an answer key
# I hope this is correct!
y_test = [[1], [2], [3]]

# same as yours...
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)

# get the accuracy
print accuracy_score(y_test, predicted)

Also, sklearn has several other metrics besides accuracy. See them here: sklearn.metrics



来源:https://stackoverflow.com/questions/19629331/python-how-to-find-accuracy-result-in-svm-text-classifier-algorithm-for-multil

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!