How can I convert RDD to DataFrame in Spark Streaming, not just Spark?
I saw this example, but it requires
Create sqlContext outside foreachRDD ,Once you convert the rdd to DF using sqlContext, you can write into S3.
For example:
val conf = new SparkConf().setMaster("local").setAppName("My App")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
myDstream.foreachRDD { rdd =>
val df = rdd.toDF()
df.write.format("json").saveAsTextFile("s3://iiiii/ttttt.json")
}
Update:
Even you can create sqlContext inside foreachRDD which is going to execute on Driver.
Look at the following answer which contains a scala magic cell inside a python notebook: How to convert Spark Streaming data into Spark DataFrame