How to vectorize DataFrame columns for ML algorithms?
问题 have a DataFrame with some categorical string values (e.g uuid|url|browser). I would to convert it in a double to execute an ML algorithm that accept double matrix. As convertion method I used StringIndexer (spark 1.4) that map my string values to double values, so I defined a function like this: def str(arg: String, df:DataFrame) : DataFrame = ( val indexer = new StringIndexer().setInputCol(arg).setOutputCol(arg+"_index") val newDF = indexer.fit(df).transform(df) return newDF ) Now the issue