I am trying to run random forest classification by using Spark ML api but I am having issues with creating right data frame input into pipeline.
Here is sample data
According to spark documentation on mllib - random trees, seems to me that you should define the features map that you are using and the points should be a labeledpoint.
This will tell the algorithm which column should be used as prediction and which ones are the features.
https://spark.apache.org/docs/latest/mllib-decision-tree.html