Spark: How can DataFrame be Dataset[Row] if DataFrame's have a schema

后端 未结 2 1059
面向向阳花
面向向阳花 2021-01-03 13:16

This article claims that a DataFrame in Spark is equivalent to a Dataset[Row], but this blog post shows that a DataFrame has a schem

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-03 13:53

    Note (in addition to the answer of T Gaweda) that there is a schema associated to each Row (Row.schema). However, this schema is not set until it is integrated in a DataFrame (or Dataset[Row])

    scala> Row(1).schema
    res12: org.apache.spark.sql.types.StructType = null
    
    scala> val rdd = sc.parallelize(List(Row(1)))
    rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[5] at parallelize at :28
    scala> spark.createDataFrame(rdd,schema).first
    res15: org.apache.spark.sql.Row = [1]
    scala> spark.createDataFrame(rdd,schema).first.schema
    res16: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))
    

提交回复
热议问题