How to split Vector into columns - using PySpark

后端未结

关注

 5  1390

夕颜 2020-11-22 16:23

Context: I have a DataFrame with 2 columns: word and vector. Where the column type of \"vector\" is VectorUDT.

An Example:

5条回答

无人及你 (楼主)

2020-11-22 17:14

def splitVecotr(df, new_features=['f1','f2']):
schema = df.schema
cols = df.columns

for col in new_features: # new_features should be the same length as vector column length
    schema = schema.add(col,DoubleType(),True)

return spark.createDataFrame(df.rdd.map(lambda row: [row[i] for i in cols]+row.features.tolist()), schema)

The function turns the feature vector column into separate columns

0 讨论(0)

查看其它5个回答