How can I access computed metrics for each fold in a CrossValidatorModel

懵懂的女人 提交于 2019-12-13 03:37:54

问题


How can I get the computed metrics for each fold from a CrossValidatorModel in spark.ml? I know I can get the average metrics using model.avgMetrics but is it possible to get the raw results on each fold to look at eg. the variance of the results?

I am using Spark 2.0.0.


回答1:


Studying the spark code here

For the folds, you can do the iteration yourself like this:

    val splits = MLUtils.kFold(dataset.toDF.rdd, $(numFolds), $(seed))
    //K-folding operation starting
    //for each fold you have multiple models created cfm. the paramgrid
    splits.zipWithIndex.foreach { case ((training, validation), splitIndex) =>
      val trainingDataset = sparkSession.createDataFrame(training, schema).cache()
      val validationDataset = sparkSession.createDataFrame(validation, schema).cache()


      val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]]
      trainingDataset.unpersist()
      var i = 0
      while (i < numModels) {
        val metric = eval.evaluate(models(i).transform(validationDataset, epm(i)))
        logDebug(s"Got metric $metric for model trained with ${epm(i)}.")
        metrics(i) += metric
        i += 1
      }

This is in scala, but the ideas are very clearly outlined.

Take a look at this answer that outlines results per fold. Hope this helps.



来源:https://stackoverflow.com/questions/38992269/how-can-i-access-computed-metrics-for-each-fold-in-a-crossvalidatormodel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!