How to compute the inverse of a RowMatrix in Apache Spark?

前端 未结 3 1544
天涯浪人
天涯浪人 2020-12-06 18:41

I have a X, distributed matrix, in RowMatrix form. I am using Spark 1.3.0. I need to be able to calculate X inverse.

相关标签:
3条回答
  • 2020-12-06 19:16

    Matrix U returned by X.computeSVD has dimensions m x k where m is the number of rows of the original (distributed) RowMatrix X. One would expect m to be large (possibly larger than k), so it is not advisable to collect it in the driver if we want our code to scale to really large values of m.

    I would say both of the solutions below suffer from this flaw. The answer given by @Alexander Kharlamov calls val U = svd.U.toBlockMatrix().toLocalMatrix() which collects the matrix in the driver. The same happens with the answer given by @Climbs_lika_Spyder (btw your nick rocks!!), which calls svd.U.rows.collect.flatMap(x => x.toArray). I would rather suggest to rely on a distributed matrix multiplication such as the Scala code posted here.

    0 讨论(0)
  • 2020-12-06 19:22
    import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,SingularValueDecomposition,DenseMatrix,DenseVector}
    import org.apache.spark.mllib.linalg.distributed.RowMatrix
    
    def computeInverse(X: RowMatrix): DenseMatrix = {
      val nCoef = X.numCols.toInt
      val svd = X.computeSVD(nCoef, computeU = true)
      if (svd.s.size < nCoef) {
        sys.error(s"RowMatrix.computeInverse called on singular matrix.")
      }
    
      // Create the inv diagonal matrix from S 
      val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x,-1))))
    
      // U cannot be a RowMatrix
      val U = new DenseMatrix(svd.U.numRows().toInt,svd.U.numCols().toInt,svd.U.rows.collect.flatMap(x => x.toArray))
    
      // If you could make V distributed, then this may be better. However its alreadly local...so maybe this is fine.
      val V = svd.V
      // inv(X) = V*inv(S)*transpose(U)  --- the U is already transposed.
      (V.multiply(invS)).multiply(U)
      }
    
    0 讨论(0)
  • 2020-12-06 19:32

    I had problems using this function with option

    conf.set("spark.sql.shuffle.partitions", "12")
    

    The rows in RowMatrix got shuffled.

    Here is an update that worked for me

    import org.apache.spark.mllib.linalg.{DenseMatrix,DenseVector}
    import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
    
    def computeInverse(X: IndexedRowMatrix)
    : DenseMatrix = 
    {
      val nCoef = X.numCols.toInt
      val svd = X.computeSVD(nCoef, computeU = true)
      if (svd.s.size < nCoef) {
        sys.error(s"IndexedRowMatrix.computeInverse called on singular matrix.")
      }
    
      // Create the inv diagonal matrix from S 
      val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x, -1))))
    
      // U cannot be a RowMatrix
      val U = svd.U.toBlockMatrix().toLocalMatrix().multiply(DenseMatrix.eye(svd.U.numRows().toInt)).transpose
    
      val V = svd.V
      (V.multiply(invS)).multiply(U)
    }
    
    0 讨论(0)
提交回复
热议问题