How to display a KeyValueGroupedDataset in Spark?

痞子三分冷 提交于 2020-11-30 06:46:30

问题


I am trying to learn datasets in Spark. One thing I can't figure out is how to display a KeyValueGroupedDataset, as show doesn't work for it. Also, what is the equivalent of a map for KeyValuGroupedDataSet? I will appreciate if someone give some examples.


回答1:


OK, I got the idea from examples given here and here. I am giving below a simple example that I've written.

val x = Seq(("a", 36), ("b", 33), ("c", 40), ("a", 38), ("c", 39)).toDS
x: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

val g = x.groupByKey(_._1)
g: org.apache.spark.sql.KeyValueGroupedDataset[String,(String, Int)] = ...

val z = g.mapGroups{case(k, iter) => (k, iter.map(x => x._2).toArray)}
z: org.apache.spark.sql.Dataset[(String, Array[Int])] = [_1: string, _2: array<int>]

z.show
+---+--------+
| _1|      _2|
+---+--------+
|  c|[40, 39]|
|  b|    [33]|
|  a|[36, 38]|
+---+--------+


来源:https://stackoverflow.com/questions/43918836/how-to-display-a-keyvaluegroupeddataset-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!