SKLearn: Getting distance of each point from decision boundary?

问题

I am using SKLearn to run SVC on my data.

from sklearn import svm

svc = svm.SVC(kernel='linear', C=C).fit(X, y)

I want to know how I can get the distance of each data point in X from the decision boundary?

回答1:

For linear kernel, the decision boundary is y = w * x + b, the distance from point x to the decision boundary is y/||w||.

y = svc.decision_function(x)
w_norm = np.linalg.norm(svc.coef_)
dist = y / w_norm

For non-linear kernels, there is no way to get the absolute distance. But you can still use the result of decision_funcion as relative distance.

回答2:

It happens to be that I am doing the homework 1 of a course named Machine Learning Techniques. And there happens to be a problem about point's distance to hyperplane even for RBF kernel.

First we know that SVM is to find an "optimal" w for a hyperplane wx + b = 0.

And the fact is that

w = \sum_{i} \alpha_i \phi(x_i)

where those x are so called support vectors and those alpha are coefficient of them. Note that there is a phi() outside the x; it is the transform function that transform x to some high dimension space (for RBF, it is infinite dimension). And we know that