问题
Is it possible to retrieve the schema of an RDD and store it in a variable? Because I want to create a new data frame from another RDD using the same schema. For example, below is what I am hoping to have:
val schema = oldDF.getSchema()
val newDF = sqlContext.createDataFrame(rowRDD, schema)
Assuming I already have rowRDD
in the format of RDD[org.apache.spark.sql.Row]
, is this something possible?
回答1:
Just use schema
attribute
val oldDF = sqlContext.createDataFrame(sc.parallelize(Seq(("a", 1))))
val rowRDD = sc.parallelize(Seq(Row("b", 2))
sqlContext.createDataFrame(rowRDD, oldDF.schema)
来源:https://stackoverflow.com/questions/37400697/spark-scala-retrieve-the-schema-and-store-it