How to split Vector into columns - using PySpark

后端 未结 5 1390
夕颜
夕颜 2020-11-22 16:23

Context: I have a DataFrame with 2 columns: word and vector. Where the column type of \"vector\" is VectorUDT.

An Example:

5条回答
  •  无人及你
    2020-11-22 17:14

    def splitVecotr(df, new_features=['f1','f2']):
    schema = df.schema
    cols = df.columns
    
    for col in new_features: # new_features should be the same length as vector column length
        schema = schema.add(col,DoubleType(),True)
    
    return spark.createDataFrame(df.rdd.map(lambda row: [row[i] for i in cols]+row.features.tolist()), schema)
    

    The function turns the feature vector column into separate columns

提交回复
热议问题