Cannot run RandomForestClassifier from spark ML on a simple example

两盒软妹~` 提交于 2019-12-03 17:04:46

Let's first fix the import to remove ambiguity

import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.feature.{StringIndexer, VectorIndexer}
import org.apache.spark.ml.{Pipeline, PipelineStage}
import org.apache.spark.ml.linalg.Vectors

I'll use the same data you used :

val training = sqlContext.createDataFrame(Seq(
  (1.0, Vectors.dense(0.0, 1.1, 0.1)),
  (0.0, Vectors.dense(2.0, 1.0, -1.0)),
  (0.0, Vectors.dense(2.0, 1.3, 1.0)),
  (1.0, Vectors.dense(0.0, 1.2, -0.5))
)).toDF("label", "features")

and then create Pipeline Stages :

val stages = new scala.collection.mutable.ArrayBuffer[PipelineStage]()
  1. For classification, re-index classes :
val labelIndexer = new StringIndexer().setInputCol("label").setOutputCol("indexedLabel").fit(training)
  1. Identify categorical features using VectorIndexer
val featuresIndexer = new VectorIndexer().setInputCol("features").setOutputCol("indexedFeatures").setMaxCategories(10).fit(training)
stages += featuresIndexer

val tmp = featuresIndexer.transform(labelIndexer.transform(training))
  1. Learn Random Forest
val rf = new RandomForestClassifier().setFeaturesCol(featuresIndexer.getOutputCol).setLabelCol(labelIndexer.getOutputCol)

stages += rf
val pipeline = new Pipeline().setStages(stages.toArray)

// Fit the Pipeline
val pipelineModel = pipeline.fit(tmp)

val results = pipelineModel.transform(training)

results.show

//+-----+--------------+---------------+-------------+-----------+----------+
//|label|      features|indexedFeatures|rawPrediction|probability|prediction|
//+-----+--------------+---------------+-------------+-----------+----------+
//|  1.0| [0.0,1.1,0.1]|  [0.0,1.0,2.0]|   [1.0,19.0]|[0.05,0.95]|       1.0|
//|  0.0|[2.0,1.0,-1.0]|  [1.0,0.0,0.0]|   [17.0,3.0]|[0.85,0.15]|       0.0|
//|  0.0| [2.0,1.3,1.0]|  [1.0,3.0,3.0]|   [14.0,6.0]|  [0.7,0.3]|       0.0|
//|  1.0|[0.0,1.2,-0.5]|  [0.0,2.0,1.0]|   [1.0,19.0]|[0.05,0.95]|       1.0|
//+-----+--------------+---------------+-------------+-----------+----------+

References: Concerning the step 1. and 2. for those who want more details on Feature transformers, I advice you to read the official documentation here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!