Convert Matrix to RowMatrix in Apache Spark using Scala

前提是你 提交于 2019-12-04 08:21:56

I suggest that you convert your Matrix to an RDD[Vector] which you can automatically convert to a RowMatrix later.

So, let's consider the following example :

import org.apache.spark.rdd._
import org.apache.spark.mllib.linalg._


val denseData = Seq(
  Vectors.dense(0.0, 1.0, 2.0),
  Vectors.dense(3.0, 4.0, 5.0),
  Vectors.dense(6.0, 7.0, 8.0),
  Vectors.dense(9.0, 0.0, 1.0)
)

val dm: Matrix = Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))

We wil need to define a method to convert that Matrix into an RDD[Vector] :

def matrixToRDD(m: Matrix): RDD[Vector] = {
   val columns = m.toArray.grouped(m.numRows)
   val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
   val vectors = rows.map(row => new DenseVector(row.toArray))
   sc.parallelize(vectors)
}

and now we can apply that conversion on the main Matrix :

 import org.apache.spark.mllib.linalg.distributed.RowMatrix
 val rows = matrixToRDD(dm)
 val mat = new RowMatrix(rows)
user8431168

small correction in above code: we need to use Vectors.dense instead of new DenseVector

val vectors = rows.map(row =>  Vectors.dense(row.toArray))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!