问题
I am new spark, Could you please let me know how to read json data using scala from kafka topic in apache spark.
Thanks.
回答1:
The simplest method would be to make use of the DataFrame abstraction shipped with Spark.
val sqlContext = new SQLContext(sc)
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, Set("myTopicName"))
stream.foreachRDD(
rdd => {
val dataFrame = sqlContext.read.json(rdd.map(_._2)) //converts json to DF
//do your operations on this DF. You won't even require a model class.
})
回答2:
I use Play Framework's library for Json. You can add it to your project as a standalone module. Usage is as follows:
import play.api.libs.json._
import org.apache.spark.streaming.kafka.KafkaUtils
case class MyClass(field1: String,
field2: Int)
implicit val myClassFormat = Json.format[MyClass]
val kafkaParams = Map[String, String](...here are your params...)
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, Set("myTopicName"))
.map(m => Json.parse(m._2).as[MyClass])
来源:https://stackoverflow.com/questions/35424724/how-to-read-json-data-using-scala-from-kafka-topic-in-apache-spark