How to prepare data into a LibSVM format from DataFrame?

前端 未结 3 802
余生分开走
余生分开走 2020-12-13 07:17

I want to make libsvm format, so I made dataframe to the desired format, but I do not know how to convert to libsvm format. The format is as shown in the figure. I hope that

3条回答
  •  既然无缘
    2020-12-13 08:03

    libsvm datatype features is a sparse vector, u can use pyspark.ml.linalg.SparseVector to solve the problem

    a = SparseVector(4, [1, 3], [3.0, 4.0])
    
    def sparsevecfuc(len,index,score):
        """
         args: len int, index array, score array
        """
        return SparseVector(len,index,score)
    trans_sparse = udf(sparsevecfuc,VectorUDT())
    

提交回复
热议问题