How to handle categorical features with spark-ml?
How do I handle categorical data with spark-ml and not spark-mllib ? Thought the documentation is not very clear, it seems that classifiers e.g. RandomForestClassifier , LogisticRegression , have a featuresCol argument, which specifies the name of the column of features in the DataFrame , and a labelCol argument, which specifies the name of the column of labeled classes in the DataFrame . Obviously I want to use more than one feature in my prediction, so I tried using the VectorAssembler to put all my features in a single vector under featuresCol . However, the VectorAssembler only accepts