问题
I have a JavaPairRDD lets say data of type
<Integer,List<Integer>>
when i do data.saveAsTextFile("output") The output will contain the data in the following format:
(1,[1,2,3,4])
etc...
I want something like this in the output file :
1 1,2,3,4
i.e. 1\t1,2,3,4
Any help would be appreciated
回答1:
You need to understand what's happening here. You have an RDD[T,U] where T and U are some obj types, read it as RDD of Tuple of T and U. On this RDD when you call saveAsTextFile(), it essentially converts each element of RDD to string, hence the text file is generated as output.
Now, how is an object of some type T converted to a string? By calling the toString() on it. This is the reason why you have [] representing the List, and () representing the Tuple as whole.
Solution, map each element in your RDD to a string as per your format. I'm not that familiar with the Java Syntax but with Scala I'll do something like,
rdd.map(e=>s"${e._1}\t${e._2.mkString(",")}")
Where mkString concatenates a collection using some delimiter.
Let me know if this helped. Cheers.
来源:https://stackoverflow.com/questions/45398795/saving-the-rdd-pair-in-particular-format-in-the-output-file