How to plot ROC curve and precision-recall curve from BinaryClassificationMetrics

寵の児 提交于 2019-12-10 13:14:33

问题


I was trying to plot ROC curve and Precision-Recall curve in graph. The points are generated from the Spark Mllib BinaryClassificationMetrics. By following the following Spark https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

[(1.0,1.0), (0.0,0.4444444444444444)] Precision
[(1.0,1.0), (0.0,1.0)] Recall
[(1.0,1.0), (0.0,0.6153846153846153)] - F1Measure    
[(0.0,1.0), (1.0,1.0), (1.0,0.4444444444444444)]- Precision-Recall curve
[(0.0,0.0), (0.0,1.0), (1.0,1.0), (1.0,1.0)] - ROC curve

回答1:


It looks like you have a similar problem to what I experienced. You need to either flip your parameters to the Metrics constructor or perhaps pass in the probability instead of the prediction. So, for example, if you are using the BinaryClassificationMetrics and a RandomForestClassifier, then according to this page (under outputs) there is "prediction" and "probability".

Then initialize your Metrics thus:

    new BinaryClassificationMetrics(predictionsWithResponse
      .select(col("probability"),col("myLabel"))
      .rdd.map(r=>(r.getAs[DenseVector](0)(1),r.getDouble(1))))

With the DenseVector call used to extract the probability of the 1 class.

As for actual plotting, that's up to you (many fine tools for that), but at least you will get more than 1 point on you curve (besides the endpoints).

And in case it's not clear:

metrics.roc().collect() will give you the data for the ROC curve: Tuples of: (false positive rate, true positive rate).



来源:https://stackoverflow.com/questions/38207990/how-to-plot-roc-curve-and-precision-recall-curve-from-binaryclassificationmetric

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!