How to map features from the output of a VectorAssembler back to the column names in Spark ML?

后端 未结 3 1703
别跟我提以往
别跟我提以往 2020-12-01 02:02

I\'m trying to run a linear regression in PySpark and I want to create a table containing summary statistics such as coefficients, P-values and t-values for each column in m

3条回答
  •  爱一瞬间的悲伤
    2020-12-01 03:03

    You can see the actual order of the columns here

    df.schema["features"].metadata["ml_attr"]["attrs"]
    

    there will be two classes usually, ["binary] & ["numeric"]

    pd.DataFrame(df.schema["features"].metadata["ml_attr"]["attrs"]["binary"]+df.schema["features"].metadata["ml_attr"]["attrs"]["numeric"]).sort_values("idx")
    

    Should give the exact order of all the columns

提交回复
热议问题