问题
In Spark (2.1.0) I've used a CrossValidator
to train a RandomForestRegressor
, using ParamGridBuilder
for maxDepth
and numTrees
:
paramGrid = ParamGridBuilder() \
.addGrid(rf.maxDepth, [2, 4, 6, 8, 10]) \
.addGrid(rf.numTrees, [10, 20, 40, 50]) \
.build()
After training, I can get the best number of trees:
regressor = cvModel.bestModel.stages[len(cvModel.bestModel.stages) - 1]
print(regressor.getNumTrees)
but I can't work out how to get the best maxDepth. I've read the documentation and I don't see what I'm missing.
I'd note that I can iterate through all the trees and find the depth of each one, eg
regressor.trees[0].depth
This seems like I'm missing something though.
回答1:
Unfortunately PySpark RandomForestRegressionModel
before Spark 2.3, unlike its Scala counterpart, doesn't store upstream Estimator
Params
, but you should be able to retrieve it directly from the JVM object. With a simple monkey patch:
from pyspark.ml.regression import RandomForestRegressionModel
RandomForestRegressionModel.getMaxDepth = (
lambda self: self._java_obj.getMaxDepth()
)
you can:
cvModel.bestModel.stages[-1].getMaxDepth()
回答2:
Even simpler, just call
cvModel.bestModel.stages[-1]._java_obj.getMaxDepth()
As @user6910411 explained, you get the bestModel, call the JVM object of this model and extract your parameter using getMaxDepth() from the JVM object. Similar works for other parameters.
来源:https://stackoverflow.com/questions/41690093/how-to-get-the-maxdepth-from-a-spark-randomforestregressionmodel