问题
How can I get the computed metrics for each fold from a CrossValidatorModel
in spark.ml
? I know I can get the average metrics using model.avgMetrics
but is it possible to get the raw results on each fold to look at eg. the variance of the results?
I am using Spark 2.0.0.
回答1:
Studying the spark code here
For the folds, you can do the iteration yourself like this:
val splits = MLUtils.kFold(dataset.toDF.rdd, $(numFolds), $(seed))
//K-folding operation starting
//for each fold you have multiple models created cfm. the paramgrid
splits.zipWithIndex.foreach { case ((training, validation), splitIndex) =>
val trainingDataset = sparkSession.createDataFrame(training, schema).cache()
val validationDataset = sparkSession.createDataFrame(validation, schema).cache()
val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]]
trainingDataset.unpersist()
var i = 0
while (i < numModels) {
val metric = eval.evaluate(models(i).transform(validationDataset, epm(i)))
logDebug(s"Got metric $metric for model trained with ${epm(i)}.")
metrics(i) += metric
i += 1
}
This is in scala, but the ideas are very clearly outlined.
Take a look at this answer that outlines results per fold. Hope this helps.
来源:https://stackoverflow.com/questions/38992269/how-can-i-access-computed-metrics-for-each-fold-in-a-crossvalidatormodel