How to get best params after tuning by pyspark.ml.tuning.TrainValidationSplit?

偶尔善良 提交于 2019-12-19 10:17:11

问题


I'm trying to tune the hyper-parameters of a Spark (PySpark) ALS model by TrainValidationSplit.

It works well, but I want to know which combination of hyper-parameters is the best. How to get best params after evaluation ?

from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import TrainValidationSplit, ParamGridBuilder
from pyspark.ml.evaluation import RegressionEvaluator

df = sqlCtx.createDataFrame(
    [(0, 0, 4.0), (0, 1, 2.0), (1, 1, 3.0), (1, 2, 4.0), (2, 1, 1.0), (2, 2, 5.0)],
    ["user", "item", "rating"],
)

df_test = sqlCtx.createDataFrame(
    [(0, 0), (0, 1), (1, 1), (1, 2), (2, 1), (2, 2)],
    ["user", "item"],
)

als = ALS()

param_grid = ParamGridBuilder().addGrid(
    als.rank,
    [10, 15],
).addGrid(
    als.maxIter,
    [10, 15],
).build()

evaluator = RegressionEvaluator(
    metricName="rmse",
    labelCol="rating",
)
tvs = TrainValidationSplit(
    estimator=als,
    estimatorParamMaps=param_grid,
    evaluator=evaluator,
)


model = tvs.fit(df)

Question: How to get best rank and maxIter ?


回答1:


You can access best model using bestModel property of the TrainValidationSplitModel:

best_model = model.bestModel

Rank can be accessed directly using rank property of the ALSModel:

best_model.rank
10

Getting maximum number of iterations requires a bit more trickery:

(best_model
    ._java_obj     # Get Java object
    .parent()      # Get parent (ALS estimator)
    .getMaxIter()) # Get maxIter
10


来源:https://stackoverflow.com/questions/41908418/how-to-get-best-params-after-tuning-by-pyspark-ml-tuning-trainvalidationsplit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!