问题
I am trying to fit an ML Cross Validator on a DataFrame of the following Schema:
root
|-- userID: string (nullable = true)
|-- features: vector (nullable = true)
|-- label: double (nullable = true)
I am getting a java.lang.UnsupportedOperationException: empty.maxBy
when I fit the CrossValidator.
I have read this bug report, it says that this exception happens there is no feautres:
In the case of empty features we fail with a better error message stating: DecisionTree requires number of features > 0, but was given an empty features vector Instead of the cryptic error message: java.lang.UnsupportedOperationException: empty.max
In my case, I do have thousands of features, so I am sure that the features DataFrame is not empty.
What could be another reason for this exception?
I am running the cluster on EMR, and here is the code if that helps (the DataFrame name is featuresDF
, and before I fit the CrossValidator
I verified that there is no empty features):
val rf = new RandomForestClassifier()
.setLabelCol("label")
.setFeaturesCol("features")
val pipeline = new Pipeline().setStages(Array(rf))
val paramGrid = new ParamGridBuilder()
.addGrid(rf.numTrees, Array(500, 1000))
.addGrid(rf.maxDepth, Array(15, 25))
.build()
val evaluator = new BinaryClassificationEvaluator()
.setLabelCol("label")
.setMetricName("areaUnderPR")
val cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(3)
val model = cv.fit(featuresDF)
来源:https://stackoverflow.com/questions/44024076/spark-2-1-0-ml-randomforest-java-lang-unsupportedoperationexception-empty-max