What does the score of the Spark MLLib SVM output mean?

徘徊边缘 提交于 2019-12-01 11:54:28

问题


I do not understand the output of the SVM classifier from the Spark MLLib algorithm. I want to convert the score to a probability, so that I get a probability for a data-point belonging to a certain class (on which the SVM is trained, a.k.a. multi-class problem) (see also this thread). It is unclear what the score means. Is it the distance to the hyperplane? How do I get the probabilities from it?


回答1:


import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.mllib.util.MLUtils

// Load training data in LIBSVM format.
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

// Split data into training (60%) and test (40%).
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache()
val test = splits(1)

// Run training algorithm to build the model
val numIterations = 100
val model = SVMWithSGD.train(training, numIterations)

// Clear the default threshold.
model.clearThreshold()

// Compute raw scores on the test set.
val scoreAndLabels = test.map { point =>
  val score = model.predict(point.features)
  (score, point.label)
}

// Get evaluation metrics.
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
val auROC = metrics.areaUnderROC()

println("Area under ROC = " + auROC)

// Save and load model
model.save(sc, "myModelPath")
val sameModel = SVMModel.load(sc, "myModelPath")

If you are using SVM module in MLLib , they provide you the AUC which is area under ROC curve and it is equivalent to "Accuracy" . Hope it helps.




回答2:


The value is the margin -- distance to separating hyperplane. It is not a probability, and SVMs do not in general give you a probability. However as comments by @cfh note, you can try to learn probabilities based on this margin. But that's separate from the SVM.



来源:https://stackoverflow.com/questions/30029863/what-does-the-score-of-the-spark-mllib-svm-output-mean

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!