Spark ML StringIndexer Different Labels Training/Testing

别等时光非礼了梦想. 提交于 2019-12-02 01:08:05

If you want to label things consistently, then you need to save the fitted stringIndexer.

Consider this sample code from the docs:

val indexer = new StringIndexer()
  .setInputCol("category")
  .setOutputCol("categoryIndex")

val indexed = indexer.fit(df).transform(df)

The indexer.fit(df) piece returns a StringIndexerModel, which then can run the transform function. So instead:

val indexerModel = indexer.fit(trainDF)
val indexed = indexerModel.transform(trainDF)

Which will later allow you to use indexerModel.transform(testDF) to get the same labels for the same inputs.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!