How to extract best parameters from a CrossValidatorModel

后端 未结 11 713
轮回少年
轮回少年 2020-12-13 02:31

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x,

In Pipeline Example in Spark documentation,

相关标签:
11条回答
  • 2020-12-13 02:58

    For me, the @orangeHIX solution is perfect:

    val cvModel = cv.fit(training)
    
    val cvMejorModelo = cvModel.bestModel.asInstanceOf[ALSModel]
    
    cvMejorModelo.parent.extractParamMap()
    
    res86: org.apache.spark.ml.param.ParamMap =
    {
        als_08eb64db650d-alpha: 0.05,
        als_08eb64db650d-checkpointInterval: 10,
        als_08eb64db650d-coldStartStrategy: drop,
        als_08eb64db650d-finalStorageLevel: MEMORY_AND_DISK,
        als_08eb64db650d-implicitPrefs: false,
        als_08eb64db650d-intermediateStorageLevel: MEMORY_AND_DISK,
        als_08eb64db650d-itemCol: product,
        als_08eb64db650d-maxIter: 10,
        als_08eb64db650d-nonnegative: false,
        als_08eb64db650d-numItemBlocks: 10,
        als_08eb64db650d-numUserBlocks: 10,
        als_08eb64db650d-predictionCol: prediction,
        als_08eb64db650d-rank: 1,
        als_08eb64db650d-ratingCol: rating,
        als_08eb64db650d-regParam: 0.1,
        als_08eb64db650d-seed: 1994790107,
        als_08eb64db650d-userCol: user
    }
    
    0 讨论(0)
  • 2020-12-13 03:06
    val bestPipelineModel = cvModel.bestModel.asInstanceOf[PipelineModel]
    val stages = bestPipelineModel.stages
    
    val hashingStage = stages(1).asInstanceOf[HashingTF]
    println("numFeatures = " + hashingStage.getNumFeatures)
    
    val lrStage = stages(2).asInstanceOf[LogisticRegressionModel]
    println("regParam = " + lrStage.getRegParam)
    

    source

    0 讨论(0)
  • 2020-12-13 03:13

    If java,see this debug show;

    bestModel.parent().extractParamMap()
    
    0 讨论(0)
  • 2020-12-13 03:16

    This is the ParamGridBuilder()

    paraGrid = ParamGridBuilder().addGrid(
    hashingTF.numFeatures, [10, 100, 1000]
    ).addGrid(
        lr.regParam, [0.1, 0.01, 0.001]
    ).build()
    

    There are 3 stages in pipeline. It seems we can assess parameters as the following:

    for stage in cv_model.bestModel.stages:
        print 'stages: {}'.format(stage)
        print stage.params
        print '\n'
    
    stage: Tokenizer_46ffb9fac5968c6c152b
    [Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='inputCol', doc='input column name'), Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='outputCol', doc='output column name')]
    
    stage: HashingTF_40e1af3ba73764848d43
    [Param(parent='HashingTF_40e1af3ba73764848d43', name='inputCol', doc='input column name'), Param(parent='HashingTF_40e1af3ba73764848d43', name='numFeatures', doc='number of features'), Param(parent='HashingTF_40e1af3ba73764848d43', name='outputCol', doc='output column name')]
    
    stage: LogisticRegression_451b8c8dbef84ecab7a9
    []
    

    However, there is no parameter in the last stage, logiscRegression.

    We can also get weight and intercept parameter from logistregression like the following:

    cv_model.bestModel.stages[1].getNumFeatures()
    10
    cv_model.bestModel.stages[2].intercept
    1.5791827733883774
    cv_model.bestModel.stages[2].weights
    DenseVector([-2.5361, -0.9541, 0.4124, 4.2108, 4.4707, 4.9451, -0.3045, 5.4348, -0.1977, -1.8361])
    

    Full exploration: http://kuanliang.github.io/2016-06-07-SparkML-pipeline/

    0 讨论(0)
  • 2020-12-13 03:16

    This SO thread kinda answers the question.

    In a nutshell, you need to cast each object to its supposed-to-be class.

    For the case of CrossValidatorModel, the following is what I did:

    import org.apache.spark.ml.tuning.CrossValidatorModel
    import org.apache.spark.ml.PipelineModel
    import org.apache.spark.ml.regression.RandomForestRegressionModel
    
    // Load CV model from S3
    val inputModelPath = "s3://path/to/my/random-forest-regression-cv"
    val reloadedCvModel = CrossValidatorModel.load(inputModelPath)
    
    // To get the parameters of the best model
    (
        reloadedCvModel.bestModel
            .asInstanceOf[PipelineModel]
            .stages(1)
            .asInstanceOf[RandomForestRegressionModel]
            .extractParamMap()
    )
    

    In the example, my pipeline has two stages (a VectorIndexer and a RandomForestRegressor), so the stage index is 1 for my model.

    0 讨论(0)
提交回复
热议问题