What is the difference between LinearSVC and SVC(kernel=“linear”)?

半城伤御伤魂 提交于 2021-02-17 10:44:55

问题


I found sklearn.svm.LinearSVC and sklearn.svm.SVC(kernel='linear') and they seem very similar to me, but I get very different results on Reuters.

sklearn.svm.LinearSVC: 81.05% in   28.87s train /    9.71s test
sklearn.svm.SVC      : 33.55% in 6536.53s train / 2418.62s test

Both have a linear kernel. The tolerance of the LinearSVC is higher than the one of SVC:

LinearSVC(C=1.0, tol=0.0001, max_iter=1000, penalty='l2', loss='squared_hinge', dual=True, multi_class='ovr', fit_intercept=True, intercept_scaling=1)
SVC      (C=1.0, tol=0.001,    max_iter=-1, shrinking=True, probability=False, cache_size=200, decision_function_shape=None)

How do both functions differ otherwise? Even if I set kernel='linear, tol=0.0001, max_iter=1000 anddecision_function_shape='ovr'theSVCtakes much longer thanLinearSVC`. Why?

I use sklearn 0.18 and both are wrapped in the OneVsRestClassifier. I'm not sure if this makes the same as multi_class='ovr' / decision_function_shape='ovr'.


回答1:


Truly, LinearSVC and SVC(kernel='linear') yield different results, i. e. metrics score and decision boundaries, because they use different approaches. The toy example below proves it:

from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC, SVC

X, y = load_iris(return_X_y=True)

clf_1 = LinearSVC().fit(X, y)  # possible to state loss='hinge'
clf_2 = SVC(kernel='linear').fit(X, y)

score_1 = clf_1.score(X, y)
score_2 = clf_2.score(X, y)

print('LinearSVC score %s' % score_1)
print('SVC score %s' % score_2)

--------------------------
>>>    0.96666666666666667
>>>    0.98666666666666669

The key principles of that difference are the following:

  • By default scaling, LinearSVC minimizes the squared hinge loss while SVC minimizes the regular hinge loss. It is possible to manually define a 'hinge' string for loss parameter in LinearSVC.
  • LinearSVC uses the One-vs-All (also known as One-vs-Rest) multiclass reduction while SVC uses the One-vs-One multiclass reduction. It is also noted here. Also, for multi-class classification problem SVC fits N * (N - 1) / 2 models where N is the amount of classes. LinearSVC, by contrast, simply fits N models. If the classification problem is binary, then only one model is fit in both scenarios. multi_class and decision_function_shape parameters have nothing in common. The second one is an aggregator that transforms the results of the decision function in a convenient shape of (n_features, n_samples). multi_class is an algorithmic approach to establish a solution.
  • The underlying estimators for LinearSVC are liblinear, that do in fact penalize the intercept. SVC uses libsvm estimators that do not. liblinear estimators are optimized for a linear (special) case and thus converge faster on big amounts of data than libsvm. That is why LinearSVC takes less time to solve the problem.

In fact, LinearSVC is not actually linear after the intercept scaling as it was stated in the comments section.



来源:https://stackoverflow.com/questions/45384185/what-is-the-difference-between-linearsvc-and-svckernel-linear

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!