How to map features from the output of a VectorAssembler back to the column names in Spark ML?

后端未结

关注

 3  1703

别跟我提以往 2020-12-01 02:02

I\'m trying to run a linear regression in PySpark and I want to create a table containing summary statistics such as coefficients, P-values and t-values for each column in m

3条回答

爱一瞬间的悲伤 (楼主)

2020-12-01 03:03
You can see the actual order of the columns here
```
df.schema["features"].metadata["ml_attr"]["attrs"]
```
there will be two classes usually, ["binary] & ["numeric"]
```
pd.DataFrame(df.schema["features"].metadata["ml_attr"]["attrs"]["binary"]+df.schema["features"].metadata["ml_attr"]["attrs"]["numeric"]).sort_values("idx")
```
Should give the exact order of all the columns
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...