Transform Java-Pair-Rdd to Rdd

混江龙づ霸主 提交于 2019-12-11 00:32:03

问题


I need to transform my Java-pair-rdd to a csv :

so i m thinking to transform it to rdd, to solve my problem.

what i want is to have my rdd transformed from :

Key   Value
Jack  [a,b,c]

to :

Key  value
Jack  a
Jack  b
Jack  c

i see that it is possible in that issue and in this issue(PySpark: Convert a pair RDD back to a regular RDD) so i am asking how to do that in java?

Update of question

The Type of my JavaPairRdd is of Type :

JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>>

and this is the form of row that contain :

((dr5rvey,dr5ruku),[(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)])

the key here is : (dr5rvey,dr5ruku) and the value is [(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)]

my original JavaRdd was of type:

JavaRDD<String>

回答1:


Understanding that the keys should be kept, you may use flatMapValues function :

Pass each value in the key-value pair RDD through a flatMap function without changing the keys; ...

JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>> input = ...;
JavaPairRDD<Tuple2<String, String>, Tuple1<String>> output1 = input.flatMapValues(iter -> iter);
JavaPairRDD<Tuple2<String, String>, String> output2 = output1.mapValues(t1 -> t1._1());



回答2:


If I understand correctly you need to use the function flat map, it enables you to create multiple rows from a single key, example in scala(just the idea youll need to change for your use case):

rdd.flatMap(arg0 => {
        var list = List[Row]()
        list = arg0._2.split(",")
        list
    })

Its a super simplified example but you should get the gist.

for rdd:

key      val
mykey   "a,b,c'

the returned rdd will be:

key      val
mykey   "a"
mykey   "b"
mykey   "c"



回答3:


The type of your RDD is RDD[(String, Array[String])] if I am getting this right. So you can just apply flatMap on this RDD.

val rdd: RDD[(String, Array[String])] = ???
val newRDD = rdd.flatMap{case (key, array) => array.map(value => (key, value))}

newRDD will be of type RDD[(String, String)]



来源:https://stackoverflow.com/questions/51283041/transform-java-pair-rdd-to-rdd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!