问题
I am working on a Spark app in which an RDD is first calculated, then need to be stored to disk, and then loaded again into Spark. To this end, I am looking for a minimal working example of saving an RDD to a local file and then loading it.
The file format is not suitable for text conversion, so saveAsTextFile
won't fly.
The RDD can either be a plain RDD or Pair RDD, it is not crucial. The file format can be either of HDFS or not.
The example can be either in Java or Scala.
Thanks!
回答1:
As long as values in the RDD are serializable you can try to use RDD.saveAsObjectFile
/ SparkContext.objectFile
:
case class Foobar(foo: Int, bar: Map[String, Int])
val rdd = sc.parallelize(Seq(
Foobar(1, Map("foo" -> 0)),
Foobar(-1, Map("bar" -> 3))
))
rdd.saveAsObjectFile("foobar")
sc.objectFile[Foobar]("foobar")
来源:https://stackoverflow.com/questions/32612071/save-and-load-spark-rdd-from-local-binary-file-minimal-working-example