How to extract best parameters from a CrossValidatorModel

后端未结

关注

 11  712

轮回少年

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x,

In Pipeline Example in Spark documentation,

相关标签:

11条回答

再見小時候

2020-12-13 02:52
This is how you get the chosen parameters
```
println(cvModel.bestModel.getMaxIter)   
println(cvModel.bestModel.getRegParam)  
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

伪装坚强ぢ

2020-12-13 02:53

One method to get a proper ParamMap object is to use CrossValidatorModel.avgMetrics: Array[Double] to find the argmax ParamMap:

implicit class BestParamMapCrossValidatorModel(cvModel: CrossValidatorModel) {
  def bestEstimatorParamMap: ParamMap = {
    cvModel.getEstimatorParamMaps
           .zip(cvModel.avgMetrics)
           .maxBy(_._2)
           ._1
  }
}

When run on the CrossValidatorModel trained in the Pipeline Example you cited gives:

scala> println(cvModel.bestEstimatorParamMap)
{
   hashingTF_2b0b8ccaeeec-numFeatures: 100,
   logreg_950a13184247-regParam: 0.1
}

0 讨论(0)

爱一瞬间的悲伤

2020-12-13 02:55

this java code should work: cvModel.bestModel().parent().extractParamMap().you can translate it to scala code parent()method will return an estimator, you can get the best params then.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-13 02:57
To print everything in paramMap, you actually don't have to call parent:
```
cvModel.bestModel.extractParamMap()
```
To answer OP's question, to get a single best parameter, for example regParam:
```
cvModel.bestModel.extractParamMap().apply(cvModel.bestModel.getParam("regParam"))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

礼貌的吻别

2020-12-13 02:57

I am working with Spark Scala 1.6.x and here is a full example of how i can set and fit a CrossValidator and then return the value of the parameter used to get the best model (assuming that training.toDF gives a dataframe ready to be used) :

import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator

// Instantiate a LogisticRegression object
val lr = new LogisticRegression()

// Instantiate a ParamGrid with different values for the 'RegParam' parameter of the logistic regression
val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0.0001, 0.001, 0.01, 0.1, 0.25, 0.5, 0.75, 1)).build()

// Setting and fitting the CrossValidator on the training set, using 'MultiClassClassificationEvaluator' as evaluator
val crossVal = new CrossValidator().setEstimator(lr).setEvaluator(new MulticlassClassificationEvaluator).setEstimatorParamMaps(paramGrid)
val cvModel = crossVal.fit(training.toDF)

// Getting the value of the 'RegParam' used to get the best model
val bestModel = cvModel.bestModel                    // Getting the best model
val paramReference = bestModel.getParam("regParam")  // Getting the reference of the parameter you want (only the reference, not the value)
val paramValue = bestModel.get(paramReference)       // Getting the value of this parameter
print(paramValue)                                    // In my case : 0.001

You can do the same for any parameter or any other type of model.

0 讨论(0)

独厮守ぢ

2020-12-13 02:57
Building in the solution of @macfeliga, a single liner that works for pipelines:
```
cvModel.bestModel.asInstanceOf[PipelineModel]
    .stages.foreach(stage => println(stage.extractParamMap))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页