Spark Matrix multiplication with python

后端 未结 1 1904
猫巷女王i
猫巷女王i 2021-01-03 03:37

I am trying to do matrix multiplication using Apache Spark and Python.

Here is my data

from pyspark.mllib.linalg.distributed import RowMatrix
         


        
相关标签:
1条回答
  • 2021-01-03 04:31

    You cannot. Since RowMatrix has no meaningful row indices it cannot be used for multiplications. Even ignoring that the only distributed matrix which supports multiplication with another distributed structure is BlockMatrix.

    from pyspark.mllib.linalg.distributed import *
    
    def as_block_matrix(rdd, rowsPerBlock=1024, colsPerBlock=1024):
        return IndexedRowMatrix(
            rdd.zipWithIndex().map(lambda xi: IndexedRow(xi[1], xi[0]))
        ).toBlockMatrix(rowsPerBlock, colsPerBlock)
    
    as_block_matrix(rows_1).multiply(as_block_matrix(rows_2))
    
    0 讨论(0)
提交回复
热议问题