How to convert RDD[GenericRecord] to dataframe in scala?

前端 未结 4 1161
轮回少年
轮回少年 2020-12-11 12:19

I get tweets from kafka topic with Avro (serializer and deserializer). Then i create a spark consumer which extracts tweets in Dstream of RDD[GenericRecord]. Now i want to c

4条回答
  •  旧巷少年郎
    2020-12-11 12:51

    You can use createDataFrame(rowRDD: RDD[Row], schema: StructType), which is available in the SQLContext object. Example for converting an RDD of an old DataFrame:

    import sqlContext.implicits.
    val rdd = oldDF.rdd
    val newDF = oldDF.sqlContext.createDataFrame(rdd, oldDF.schema)
    

    Note that there is no need to explicitly set any schema column. We reuse the old DF's schema, which is of StructType class and can be easily extended. However, this approach sometimes is not possible, and in some cases can be less efficient than the first one.

提交回复
热议问题