Save and load Spark RDD from local binary file - minimal working example

不打扰是莪最后的温柔 提交于 2019-12-22 17:51:44

问题


I am working on a Spark app in which an RDD is first calculated, then need to be stored to disk, and then loaded again into Spark. To this end, I am looking for a minimal working example of saving an RDD to a local file and then loading it.

The file format is not suitable for text conversion, so saveAsTextFile won't fly.

The RDD can either be a plain RDD or Pair RDD, it is not crucial. The file format can be either of HDFS or not.

The example can be either in Java or Scala.

Thanks!


回答1:


As long as values in the RDD are serializable you can try to use RDD.saveAsObjectFile / SparkContext.objectFile:

case class Foobar(foo: Int, bar: Map[String, Int])
val rdd = sc.parallelize(Seq(
    Foobar(1, Map("foo" -> 0)),
    Foobar(-1, Map("bar" -> 3))
))

rdd.saveAsObjectFile("foobar")
sc.objectFile[Foobar]("foobar")


来源:https://stackoverflow.com/questions/32612071/save-and-load-spark-rdd-from-local-binary-file-minimal-working-example

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!