How to map variable names to features after pipeline

非 Y 不嫁゛ 提交于 2019-11-28 21:59:06

I assume what you want here is an access the features metadata. Lets start with transforming existing DataFrame:

val transformedDF = pipelineModel.transform(df)

Next you can extract metadata object:

val meta: org.apache.spark.sql.types.Metadata = transformedDF
  .schema(transformedDF.schema.fieldIndex("features"))
  .metadata

Finally lets extract attributes:

meta.getMetadata("ml_attr").getMetadata("attrs")
//  org.apache.spark.sql.types.Metadata = {"binary":[
//    {"idx":0,"name":"e"},{"idx":1,"name":"f"},{"idx":2,"name":"a"},
//    {"idx":3,"name":"b"},{"idx":4,"name":"c"}]}

These can be used to relate weights back to the original features.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!