How to print best model params in Apache Spark Pipeline?

删除回忆录丶 提交于 2019-12-10 17:34:51

问题


I'm using pipeline API of Apache Spark for validation of parameters. I'm building TrainValidationSplitModel like this :

Pipeline pipeline = ...
ParamMap[] paramGrid = ...

TrainValidationSplit trainValidationSplit = new TrainValidationSplit().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator()).setEstimatorParamMaps(paramGrid).setTrainRatio(0.8);
TrainValidationSplitModel model = trainValidationSplit.fit(training);

My question is: how can I extract and print params of best trained model?


回答1:


Finally I did it. Spark prints this metrics after training. I had ERROR log level for spark, so I haven't seen this:

2015-10-21 12:57:33,828 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Train validation split metrics: WrappedArray(0.7141940371838821, 0.7358721053749735)

2015-10-21 12:57:33,831 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Best set of parameters:
{
    hashingTF_79cf758f5ab1-numFeatures: 2000000,
    nb_67d55ce4e1fc-smoothing: 1.0
}

2015-10-21 12:57:33,831 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Best train validation split metric: 0.7358721053749735.

Now I've added level INFO for class TrainValidationSplit in my log4j.properties file:

log4j.logger.org.apache.spark.ml.tuning.TrainValidationSplit=INFO
log4j.additivity.org.apache.spark.ml.tuning.TrainValidationSplit=false


来源:https://stackoverflow.com/questions/32565594/how-to-print-best-model-params-in-apache-spark-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!