How to convert a mllib matrix to a spark dataframe?

岁酱吖の 提交于 2019-12-02 07:29:34

Using the toString method should be the easiest and fastest way if you simply want to print the matrix. You can change the output by inputting the maximum number of lines to print as well as max line width. You can change the formatting by splitting on new lines and ",". For example:

val matrix = Matrices.dense(2,3, Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
matrix.toString
  .split("\n")
  .map(_.trim.split(" ").filter(_ != "").mkString("[", ",", "]"))
  .mkString("\n")

which will give the following:

[1.0,3.0,5.0]
[2.0,4.0,6.0]

However, if you want to convert the matrix to an DataFrame, the easiest way would be to first create an RDD and then use toDF().

val matrixRows = matrix.rowIter.toSeq.map(_.toArray)
val df = spark.sparkContext.parallelize(matrixRows).toDF("Row")

Then to put each value in a separate column you can do the following

val numOfCols = matrixRows.head.length
val df2 = (0 until numOfCols).foldLeft(df)((df, num) => 
    df.withColumn("Col" + num, $"Row".getItem(num)))
  .drop("Row")
df2.show(false)

Result using the example data:

+----+----+----+
|Col0|Col1|Col2|
+----+----+----+
|1.0 |3.0 |5.0 |
|2.0 |4.0 |6.0 |
+----+----+----+
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!