Create feature vector programmatically in Spark ML / pyspark

邮差的信 提交于 2019-11-28 06:54:40
zero323

You can use VectorAssembler:

from pyspark.ml.feature import VectorAssembler

ignore = ['id', 'label', 'binomial_label']
assembler = VectorAssembler(
    inputCols=[x for x in df.columns if x not in ignore],
    outputCol='features')

assembler.transform(df)

It can be combined with k-means using ML Pipeline:

from pyspark.ml import Pipeline

pipeline = Pipeline(stages=[assembler, kmeans_estimator])
model = pipeline.fit(df)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!