I want to find the parameters of ParamGridBuilder
that make the best model in CrossValidator in Spark 1.4.x,
In Pipeline Example in Spark documentation,
This is how you get the chosen parameters
println(cvModel.bestModel.getMaxIter)
println(cvModel.bestModel.getRegParam)
One method to get a proper ParamMap
object is to use CrossValidatorModel.avgMetrics: Array[Double]
to find the argmax ParamMap
:
implicit class BestParamMapCrossValidatorModel(cvModel: CrossValidatorModel) {
def bestEstimatorParamMap: ParamMap = {
cvModel.getEstimatorParamMaps
.zip(cvModel.avgMetrics)
.maxBy(_._2)
._1
}
}
When run on the CrossValidatorModel
trained in the Pipeline Example you cited gives:
scala> println(cvModel.bestEstimatorParamMap)
{
hashingTF_2b0b8ccaeeec-numFeatures: 100,
logreg_950a13184247-regParam: 0.1
}
this java code should work:
cvModel.bestModel().parent().extractParamMap()
.you can translate it to scala code
parent()
method will return an estimator, you can get the best params then.
To print everything in paramMap
, you actually don't have to call parent:
cvModel.bestModel.extractParamMap()
To answer OP's question, to get a single best parameter, for example regParam
:
cvModel.bestModel.extractParamMap().apply(cvModel.bestModel.getParam("regParam"))
I am working with Spark Scala 1.6.x and here is a full example of how i can set and fit a CrossValidator
and then return the value of the parameter used to get the best model (assuming that training.toDF
gives a dataframe ready to be used) :
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
// Instantiate a LogisticRegression object
val lr = new LogisticRegression()
// Instantiate a ParamGrid with different values for the 'RegParam' parameter of the logistic regression
val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0.0001, 0.001, 0.01, 0.1, 0.25, 0.5, 0.75, 1)).build()
// Setting and fitting the CrossValidator on the training set, using 'MultiClassClassificationEvaluator' as evaluator
val crossVal = new CrossValidator().setEstimator(lr).setEvaluator(new MulticlassClassificationEvaluator).setEstimatorParamMaps(paramGrid)
val cvModel = crossVal.fit(training.toDF)
// Getting the value of the 'RegParam' used to get the best model
val bestModel = cvModel.bestModel // Getting the best model
val paramReference = bestModel.getParam("regParam") // Getting the reference of the parameter you want (only the reference, not the value)
val paramValue = bestModel.get(paramReference) // Getting the value of this parameter
print(paramValue) // In my case : 0.001
You can do the same for any parameter or any other type of model.
Building in the solution of @macfeliga, a single liner that works for pipelines:
cvModel.bestModel.asInstanceOf[PipelineModel]
.stages.foreach(stage => println(stage.extractParamMap))