问题
Any idea on the recommended parameters for OpenCV SVM? I'm playing with the letter_recog.cpp in the OpenCV sample directory, however, the SVM accuracy is very poor! In one run I only got 62% accuracy:
$ ./letter_recog_modified -data /home/cobalt/opencv/samples/data/letter-recognition.data -save svm_letter_recog.xml -svm
The database /home/cobalt/opencv/samples/data/letter-recognition.data is loaded.
Training the classifier ...
data.size() = [16 x 20000]
responses.size() = [1 x 20000]
Recognition rate: train = 64.3%, test = 62.2%
The default parameters are:
model = SVM::create();
model->setType(SVM::C_SVC);
model->setKernel(SVM::LINEAR);
model->setC(1);
model->train(tdata);
Setting it to trainAuto() didn't help; it gave me a weird 0 % test accuracy:
model = SVM::create();
model->setType(SVM::C_SVC);
model->setKernel(SVM::LINEAR);
model->trainAuto(tdata);
Result:
Recognition rate: train = 0.0%, test = 0.0%
Update using yangjie's answer:
$ ./letter_recog_modified -data /home/cobalt/opencv/samples/data/letter-recognition.data -save svm_letter_recog.xml -svm
The database /home/cobalt/opencv/samples/data/letter-recognition.data is loaded.
Training the classifier ...
data.size() = [16 x 20000]
responses.size() = [1 x 20000]
Recognition rate: train = 58.8%, test = 57.5%
The result is no longer 0% but the accuracy is worse than the 62% earlier.
Using the RBF kernel with trainAuto() is worst?
$ ./letter_recog_modified_rbf -data /home/cobalt/opencv/samples/data/letter-recognition.data -save svm_letter_recog.xml -svm
The database /home/cobalt/opencv/samples/data/letter-recognition.data is loaded.
Training the classifier ...
data.size() = [16 x 20000]
responses.size() = [1 x 20000]
Recognition rate: train = 18.5%, test = 11.6%
Parameters:
model = SVM::create();
model->setType(SVM::C_SVC);
model->setKernel(SVM::RBF);
model->trainAuto(tdata);
回答1:
I debugged the sample code and found the reason.
The responses is a Mat of ASCII code of the letters.
However, the predicted labels returned from SVM trained by SVM::trainAuto are ranging from 0-25, which correspond to the 26 classes. This can also be observed by looking at <class_labels>...</class_labels> in the output file svm_letter_recog.xml.
Therefore in test_and_save_classifier, r = model->predict( sample ) and responses.at<int>(i) are apparently not equal.
I also found that if we use SVM::train, the class labels would be from 65-89 instead, which is why you can get normal result at first.
Solution
I am not sure whether it is a bug. But if you want to use SVM::trainAuto in this sample now, you can change
test_and_save_classifier(model, data, responses, ntrain_samples, 0, filename_to_save);
in build_svm_classifier to
test_and_save_classifier(model, data, responses, ntrain_samples, 'A', filename_to_save);
Update
trainAuto and train should have the same behavior in class_labels. The problem is due to a bug fix before. So I have created a pull request to OpenCV to fix the problem.
回答2:
I suggest trying RBF kernel instead of linear. In many, many cases it is the best choice...
来源:https://stackoverflow.com/questions/31178095/recommended-values-for-opencv-svm-parameters