How to get the maxDepth from a Spark RandomForestRegressionModel

拥有回忆 提交于 2019-12-11 06:47:12

问题


In Spark (2.1.0) I've used a CrossValidator to train a RandomForestRegressor, using ParamGridBuilder for maxDepth and numTrees:

paramGrid = ParamGridBuilder() \
    .addGrid(rf.maxDepth, [2, 4, 6, 8, 10]) \
    .addGrid(rf.numTrees, [10, 20, 40, 50]) \
    .build()

After training, I can get the best number of trees:

regressor = cvModel.bestModel.stages[len(cvModel.bestModel.stages) - 1]

print(regressor.getNumTrees)

but I can't work out how to get the best maxDepth. I've read the documentation and I don't see what I'm missing.

I'd note that I can iterate through all the trees and find the depth of each one, eg

regressor.trees[0].depth

This seems like I'm missing something though.


回答1:


Unfortunately PySpark RandomForestRegressionModel before Spark 2.3, unlike its Scala counterpart, doesn't store upstream Estimator Params, but you should be able to retrieve it directly from the JVM object. With a simple monkey patch:

from pyspark.ml.regression import RandomForestRegressionModel

RandomForestRegressionModel.getMaxDepth = (
    lambda self: self._java_obj.getMaxDepth()
)

you can:

cvModel.bestModel.stages[-1].getMaxDepth()



回答2:


Even simpler, just call

    cvModel.bestModel.stages[-1]._java_obj.getMaxDepth()

As @user6910411 explained, you get the bestModel, call the JVM object of this model and extract your parameter using getMaxDepth() from the JVM object. Similar works for other parameters.



来源:https://stackoverflow.com/questions/41690093/how-to-get-the-maxdepth-from-a-spark-randomforestregressionmodel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!