I\'m trying to perform a logistic regression (LogisticRegressionWithLBFGS) with Spark MLlib (with Scala) on a dataset which contains categorical variables. I discover Spark
A VectorIndexer is coming in Spark 1.4 which might help you with this kind of feature transformation: http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/api/scala/index.html#org.apache.spark.ml.feature.VectorIndexer
However it looks like this will only be available in spark.ml rather than mllib
https://issues.apache.org/jira/browse/SPARK-4081