Context: I have a DataFrame
with 2 columns: word and vector. Where the column type of \"vector\" is VectorUDT
.
An Example:
def splitVecotr(df, new_features=['f1','f2']):
schema = df.schema
cols = df.columns
for col in new_features: # new_features should be the same length as vector column length
schema = schema.add(col,DoubleType(),True)
return spark.createDataFrame(df.rdd.map(lambda row: [row[i] for i in cols]+row.features.tolist()), schema)
The function turns the feature vector column into separate columns