How to convert ArrayType to DenseVector in PySpark DataFrame?

后端 未结 1 1596
难免孤独
难免孤独 2020-12-14 20:55

I\'m getting the following error trying to build a ML Pipeline:

pyspark.sql.utils.IllegalArgumentException: \'requirement failed: Column feature         


        
相关标签:
1条回答
  • 2020-12-14 21:40

    You can use UDF:

    udf(lambda vs: Vectors.dense(vs), VectorUDT())
    

    In Spark < 2.0 import:

    from pyspark.mllib.linalg import Vectors, VectorUDT
    

    In Spark 2.0+ import:

    from pyspark.ml.linalg import Vectors, VectorUDT
    

    Please note that these classes are not compatible despite identical implementation.

    It is also possible to extract individual features and assemble with VectorAssembler. Assuming input column is called features:

    from pyspark.ml.feature import VectorAssembler
    
    n = ... # Size of features
    
    assembler = VectorAssembler(
        inputCols=["features[{0}]".format(i) for i in range(n)], 
        outputCol="features_vector")
    
    assembler.transform(df.select(
        "*", *(df["features"].getItem(i) for i in range(n))
    ))
    
    0 讨论(0)
提交回复
热议问题