Set the weights of decision functions through stdin in Sklearn

问题

Is there a method that I can input the coefficients to the clf of SVC in my script, then apply clf.score() or clf.predict() function for further test?

Currently I am using joblib.dump(clf,'file.plk') to save all the information of a trained clf. But this involves the disk writing/reading. It will be helpful for me if I can just define a clf with two arrays representing the support vector (clf.support_vectors_), weights (clf.coef_/clf.dual_coef_), and bias (clf.intercept_) respectively.

回答1:

This line calls the prediction function from libsvm. It looks like this (but please take a look at the whole function _dense_predict):

libsvm.predict(
        X, self.support_, self.support_vectors_, self.n_support_,
        self.dual_coef_, self._intercept_,
        self.probA_, self.probB_, svm_type=svm_type, kernel=kernel,
        degree=self.degree, coef0=self.coef0, gamma=self._gamma,
        cache_size=self.cache_size)

You can use this line and give it all the relevant information directly and will obtain a raw prediction. In order to do this, you must import the libsvm from sklearn.svm import libsvm. If your initial fitted classifier is called svc, then you can obtain all the relevant information from it by replacing all the self keywords with svc and keeping the values. If svc._impl gives you "c_svc", then you set svm_type=0.

Note that at the beginning of the _dense_predict function you have X = self._compute_kernel(X). If your data is X, then you need to transform it by doing K = svc._compute_kernel(X), and call the libsvm.predict function with K as the first argument

Scoring is independent from all this. Take a look at sklearn.metrics, where you will find e.g. the accuracy_score, which is the default score in SVM.

This is of course a somewhat suboptimal way of doing things, but in this specific case, if is impossible (I didn't check very hard) to set coefficients, then going into the code and seeing what it does and extracting the relevant part is surely an option.

回答2:

Check out this blog post on memory usage of sklearn models using succinct tries to see if it is applicable.

If the other location does not have access to the sklearn packages you would need to create your own score and predict functions. clf.score() and clf.predict() requires clf to be an sklearn object.

来源：https://stackoverflow.com/questions/22815536/set-the-weights-of-decision-functions-through-stdin-in-sklearn

标签

python

scikit-learn

svm