Spark入门学习笔记

℡╲_俬逩灬. 提交于 2020-08-11 10:03:29

Spark SQL & DateFrame

Dataset是RDD的封装,DataFrame是Dataset的进一步封装。

  • A Dataset is a distributed collection of data. Dataset is a new interface added in Spark 1.6 that provides the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s optimized execution engine.
  • A DataFrame is a Dataset organized into named columns.

Spark 中间表

参考资料

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!