How to create correct data frame for classification in Spark ML

前端 未结 3 1838
鱼传尺愫
鱼传尺愫 2020-12-04 10:05

I am trying to run random forest classification by using Spark ML api but I am having issues with creating right data frame input into pipeline.

Here is sample data

3条回答
  •  天涯浪人
    2020-12-04 10:32

    According to spark documentation on mllib - random trees, seems to me that you should define the features map that you are using and the points should be a labeledpoint.

    This will tell the algorithm which column should be used as prediction and which ones are the features.

    https://spark.apache.org/docs/latest/mllib-decision-tree.html

提交回复
热议问题