Spark SQL & DateFrame
Dataset是RDD的封装,DataFrame是Dataset的进一步封装。
- A Dataset is a distributed collection of data. Dataset is a new interface added in Spark 1.6 that provides the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s optimized execution engine.
- A DataFrame is a Dataset organized into named columns.
Spark 中间表
参考资料
来源:oschina
链接:https://my.oschina.net/u/4341499/blog/4460932