Is DAG created when we perform operations over dataframes?

血红的双手。 提交于 2019-12-02 13:33:19

Every operation on a Dataset, continuous processing mode notwithstanding, is translated into a sequence of operations on internal RDDs. Therefore concept of DAG is by all means applicable.

By extension, execution is primarily lazy, though as always exceptions exists, and are more common in Dataset API, compared to pure RDD API.

Finally Catalyst is responsible for transforming Dataset API calls, into logical, optimized logical and physical execution plan, and finally generating code which will executed by the tasks.

RDD is building block of spark. No matter which abstraction Dataframe or Dataset we use, internally final computation is done on RDDs.

i.e - When you perform operation on Dataframes that time also DAG created.

below link is helpful https://medium.com/@thejasbabu/spark-dataframes-10c349de04c

for catalyst optimizer

You can follow below link for more info https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781783987061/4/ch04lvl1sec31/understanding-the-catalyst-optimizer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!