Recommended values for OpenCV SVM parameters

问题

Any idea on the recommended parameters for OpenCV SVM? I'm playing with the letter_recog.cpp in the OpenCV sample directory, however, the SVM accuracy is very poor! In one run I only got 62% accuracy:

$ ./letter_recog_modified -data /home/cobalt/opencv/samples/data/letter-recognition.data  -save svm_letter_recog.xml -svm

The database /home/cobalt/opencv/samples/data/letter-recognition.data is loaded.
Training the classifier ...
data.size() = [16 x 20000]
responses.size() = [1 x 20000]

Recognition rate: train = 64.3%, test = 62.2%

The default parameters are:

model = SVM::create();
model->setType(SVM::C_SVC);
model->setKernel(SVM::LINEAR);
model->setC(1);
model->train(tdata);

Setting it to trainAuto() didn't help; it gave me a weird 0 % test accuracy:

model = SVM::create();
model->setType(SVM::C_SVC);
model->setKernel(SVM::LINEAR);
model->trainAuto(tdata);

Result:

Recognition rate: train = 0.0%, test = 0.0%

Update using yangjie's answer:

$ ./letter_recog_modified -data /home/cobalt/opencv/samples/data/letter-recognition.data  -save svm_letter_recog.xml -svm
The database /home/cobalt/opencv/samples/data/letter-recognition.data is loaded.
Training the classifier ...
data.size() = [16 x 20000]
responses.size() = [1 x 20000]

Recognition rate: train = 58.8%, test = 57.5%

The result is no longer 0% but the accuracy is worse than the 62% earlier.

Using the RBF kernel with trainAuto() is worst?

$ ./letter_recog_modified_rbf -data /home/cobalt/opencv/samples/data/letter-recognition.data  -save svm_letter_recog.xml -svm
The database /home/cobalt/opencv/samples/data/letter-recognition.data is loaded.
Training the classifier ...
data.size() = [16 x 20000]
responses.size() = [1 x 20000]

Recognition rate: train = 18.5%, test = 11.6%

Parameters:

    model = SVM::create();
    model->setType(SVM::C_SVC);
    model->setKernel(SVM::RBF);
    model->trainAuto(tdata);

回答1:

I debugged the sample code and found the reason.

The responses is a Mat of ASCII code of the letters.

However, the predicted labels returned from SVM trained by SVM::trainAuto are ranging from 0-25, which correspond to the 26 classes. This can also be observed by looking at <class_labels>...</class_labels> in the output file svm_letter_recog.xml.

Therefore in test_and_save_classifier, r = model->predict( sample ) and responses.at<int>(i) are apparently not equal.

I also found that if we use SVM::train, the class labels would be from 65-89 instead, which is why you can get normal result at first.

Solution

I am not sure whether it is a bug. But if you want to use SVM::trainAuto in this sample now, you can change

test_and_save_classifier(model, data, responses, ntrain_samples, 0, filename_to_save);

in build_svm_classifier to

test_and_save_classifier(model, data, responses, ntrain_samples, 'A', filename_to_save);

Update

trainAuto and train should have the same behavior in class_labels. The problem is due to a bug fix before. So I have created a pull request to OpenCV to fix the problem.

回答2:

I suggest trying RBF kernel instead of linear. In many, many cases it is the best choice...

来源：https://stackoverflow.com/questions/31178095/recommended-values-for-opencv-svm-parameters

标签

c++

OpenCV

svm