How to extract best parameters from a CrossValidatorModel

后端 未结 11 712
轮回少年
轮回少年 2020-12-13 02:31

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x,

In Pipeline Example in Spark documentation,

相关标签:
11条回答
  • 2020-12-13 02:52

    This is how you get the chosen parameters

    println(cvModel.bestModel.getMaxIter)   
    println(cvModel.bestModel.getRegParam)  
    
    0 讨论(0)
  • 2020-12-13 02:53

    One method to get a proper ParamMap object is to use CrossValidatorModel.avgMetrics: Array[Double] to find the argmax ParamMap:

    implicit class BestParamMapCrossValidatorModel(cvModel: CrossValidatorModel) {
      def bestEstimatorParamMap: ParamMap = {
        cvModel.getEstimatorParamMaps
               .zip(cvModel.avgMetrics)
               .maxBy(_._2)
               ._1
      }
    }
    

    When run on the CrossValidatorModel trained in the Pipeline Example you cited gives:

    scala> println(cvModel.bestEstimatorParamMap)
    {
       hashingTF_2b0b8ccaeeec-numFeatures: 100,
       logreg_950a13184247-regParam: 0.1
    }
    
    0 讨论(0)
  • 2020-12-13 02:55

    this java code should work: cvModel.bestModel().parent().extractParamMap().you can translate it to scala code parent()method will return an estimator, you can get the best params then.

    0 讨论(0)
  • 2020-12-13 02:57

    To print everything in paramMap, you actually don't have to call parent:

    cvModel.bestModel.extractParamMap()
    

    To answer OP's question, to get a single best parameter, for example regParam:

    cvModel.bestModel.extractParamMap().apply(cvModel.bestModel.getParam("regParam"))
    
    0 讨论(0)
  • 2020-12-13 02:57

    I am working with Spark Scala 1.6.x and here is a full example of how i can set and fit a CrossValidator and then return the value of the parameter used to get the best model (assuming that training.toDF gives a dataframe ready to be used) :

    import org.apache.spark.ml.classification.LogisticRegression
    import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
    import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
    
    // Instantiate a LogisticRegression object
    val lr = new LogisticRegression()
    
    // Instantiate a ParamGrid with different values for the 'RegParam' parameter of the logistic regression
    val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0.0001, 0.001, 0.01, 0.1, 0.25, 0.5, 0.75, 1)).build()
    
    // Setting and fitting the CrossValidator on the training set, using 'MultiClassClassificationEvaluator' as evaluator
    val crossVal = new CrossValidator().setEstimator(lr).setEvaluator(new MulticlassClassificationEvaluator).setEstimatorParamMaps(paramGrid)
    val cvModel = crossVal.fit(training.toDF)
    
    // Getting the value of the 'RegParam' used to get the best model
    val bestModel = cvModel.bestModel                    // Getting the best model
    val paramReference = bestModel.getParam("regParam")  // Getting the reference of the parameter you want (only the reference, not the value)
    val paramValue = bestModel.get(paramReference)       // Getting the value of this parameter
    print(paramValue)                                    // In my case : 0.001
    

    You can do the same for any parameter or any other type of model.

    0 讨论(0)
  • 2020-12-13 02:57

    Building in the solution of @macfeliga, a single liner that works for pipelines:

    cvModel.bestModel.asInstanceOf[PipelineModel]
        .stages.foreach(stage => println(stage.extractParamMap))
    
    0 讨论(0)
提交回复
热议问题