Spark Scala: retrieve the schema and store it

守給你的承諾、 提交于 2019-12-10 09:27:18

问题


Is it possible to retrieve the schema of an RDD and store it in a variable? Because I want to create a new data frame from another RDD using the same schema. For example, below is what I am hoping to have:

val schema = oldDF.getSchema()
val newDF = sqlContext.createDataFrame(rowRDD, schema)

Assuming I already have rowRDD in the format of RDD[org.apache.spark.sql.Row] , is this something possible?


回答1:


Just use schema attribute

val oldDF = sqlContext.createDataFrame(sc.parallelize(Seq(("a", 1))))
val rowRDD = sc.parallelize(Seq(Row("b", 2))

sqlContext.createDataFrame(rowRDD, oldDF.schema)


来源:https://stackoverflow.com/questions/37400697/spark-scala-retrieve-the-schema-and-store-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!