How to extract best parameters from a CrossValidatorModel

后端 未结 11 711
轮回少年
轮回少年 2020-12-13 02:31

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x,

In Pipeline Example in Spark documentation,

11条回答
  •  攒了一身酷
    2020-12-13 03:16

    This is the ParamGridBuilder()

    paraGrid = ParamGridBuilder().addGrid(
    hashingTF.numFeatures, [10, 100, 1000]
    ).addGrid(
        lr.regParam, [0.1, 0.01, 0.001]
    ).build()
    

    There are 3 stages in pipeline. It seems we can assess parameters as the following:

    for stage in cv_model.bestModel.stages:
        print 'stages: {}'.format(stage)
        print stage.params
        print '\n'
    
    stage: Tokenizer_46ffb9fac5968c6c152b
    [Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='inputCol', doc='input column name'), Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='outputCol', doc='output column name')]
    
    stage: HashingTF_40e1af3ba73764848d43
    [Param(parent='HashingTF_40e1af3ba73764848d43', name='inputCol', doc='input column name'), Param(parent='HashingTF_40e1af3ba73764848d43', name='numFeatures', doc='number of features'), Param(parent='HashingTF_40e1af3ba73764848d43', name='outputCol', doc='output column name')]
    
    stage: LogisticRegression_451b8c8dbef84ecab7a9
    []
    

    However, there is no parameter in the last stage, logiscRegression.

    We can also get weight and intercept parameter from logistregression like the following:

    cv_model.bestModel.stages[1].getNumFeatures()
    10
    cv_model.bestModel.stages[2].intercept
    1.5791827733883774
    cv_model.bestModel.stages[2].weights
    DenseVector([-2.5361, -0.9541, 0.4124, 4.2108, 4.4707, 4.9451, -0.3045, 5.4348, -0.1977, -1.8361])
    

    Full exploration: http://kuanliang.github.io/2016-06-07-SparkML-pipeline/

提交回复
热议问题