I\'m trying to run a linear regression in PySpark and I want to create a table containing summary statistics such as coefficients, P-values and t-values for each column in m
You can see the actual order of the columns here
df.schema["features"].metadata["ml_attr"]["attrs"]
there will be two classes usually, ["binary] & ["numeric"]
pd.DataFrame(df.schema["features"].metadata["ml_attr"]["attrs"]["binary"]+df.schema["features"].metadata["ml_attr"]["attrs"]["numeric"]).sort_values("idx")
Should give the exact order of all the columns