SVM-OVO vs SVM-OVA in a very basic example

问题

Trying to understand how SVM-OVR (One-Vs-Rest) works, I was testing the following code:

import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC
x = np.array([[1,1.1],[1,2],[2,1]])
y = np.array([0,100,250])
classifier = SVC(kernel='linear', decision_function_shape='ovr')
classifier.fit(x,y)
print(classifier.predict([[1,2]]))
print(classifier.decision_function([[1,2]]))

The outputs are:

[100]
[[ 1.05322128  2.1947332  -0.20488118]]

It means that the sample [1,2] is correctly predicted in the class 100 (it is quite obvious since [1,2] was used also for training).

But, let's give a look to the decision functions. SVM-OVA should generate three classifiers, i.e., three lines. The first that separates class1 from class2 U class3, the second that separates class2 from class1 U class3, and the third that separates class3 from class1 U class2. My original goal was exactly to understand what the decision function values mean. I knew that positive values mean that the sample is in the right side of the plane, and viceversa; and that larger is the value then larger is the distance of the sample between the hyperplane (a line in this case), and then larger is the confidence that sample belongs to that class.

However, something was clearly wrong since two decision function values are positive, while it was supposed that just the correct class should report a positive decision function (since the predicted value is also a training sample). For this reason, I tried to plot the separating lines.

fig, ax = plt.subplots()
ax.scatter(x[:, 0], x[:, 1], c=y, cmap=plt.cm.winter, s=25)
# create a mesh to plot in
x_min, x_max = x[:, 0].min() - 1, x[:, 0].max() + 1
y_min, y_max = x[:, 1].min() - 1, x[:, 1].max() + 1
xx2, yy2 = np.meshgrid(np.arange(x_min, x_max, .2),np.arange(y_min, y_max, .2))
Z = classifier.predict(np.c_[xx2.ravel(), yy2.ravel()])
Z = Z.reshape(xx2.shape)
ax.contourf(xx2, yy2, Z, cmap=plt.cm.winter, alpha=0.3)

w = classifier.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - (classifier.intercept_[0]) / w[1]
ax.plot(xx,yy)

w = classifier.coef_[1]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - (classifier.intercept_[1]) / w[1]
ax.plot(xx,yy)

w = classifier.coef_[2]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - (classifier.intercept_[2]) / w[1]
ax.plot(xx,yy)

ax.axis([x_min, x_max,y_min, y_max])
plt.show()

This is what I obtained:

Surprise: indeed, those separating lines represent the hyperplanes when the OVO (One-Vs-One) strategy is computed: indeed, you can notice that those lines separate class1 from class2, class2 from class3 and class1 from class3.

I've also tried to add a class:

import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC
x = np.array([[1,1.1],[1,2],[2,1],[3,3]])
y = np.array([0,100,250, 500])
classifier = SVC(kernel='linear', decision_function_shape='ovr')
classifier.fit(x,y)

and what happens is that the vector representing the decision functions has length equals to 4 (accordingly with the OVA strategy), but again 6 lines are generated (as if I had implemented the OVO strategy).

classifier.decision_function([[1,2]])
[[ 2.14182753  3.23543808  0.83375105 -0.22753309]]

classifier.coef_
array([[ 0.        , -0.9       ],
   [-1.        ,  0.1       ],
   [-0.52562421, -0.49934299],
   [-1.        ,  1.        ],
   [-0.8       , -0.4       ],
   [-0.4       , -0.8       ]])

My final questions: what do the decision function values represent? Why even when applying OVA strategy, n(n-1)/2 hyperplanes are generated, instead of n ones?

回答1:

The point is that, by default, SVM do implement an OvO strategy (see here for reference).

SVC and NuSVC implement the “one-versus-one” approach for multi-class classification.

At the same time, by default (even though in your case you have made it explicit) decision_function_shape is set to be 'ovr'.

"To provide a consistent interface with other classifiers, the decision_function_shape option allows to monotonically transform the results of the “one-versus-one” classifiers to a “one-vs-rest” decision function of shape (n_samples, n_classes).

The reason why an OvO strategy is implemented is that SVM algos scale poorly with the size of the training set (and with the OvO strategy each classifier is only trained on the part of the training set which corresponds to the classes it has to distinguish). In principle, you can force a SVM classifier to implement an OvA strategy via an instance of OneVsRestClassifier, eg:

ovr_svc = OneVsRestClassifier(SVC(kernel='linear'))

来源：https://stackoverflow.com/questions/64992932/svm-ovo-vs-svm-ova-in-a-very-basic-example

标签

python

machine-learning

scikit-learn

svm

multiclass-classification