Spark 2.x - How to generate simple Explain/Execution Plan

自闭症网瘾萝莉.ら 提交于 2020-04-11 12:38:44

问题


I am hoping to generate an explain/execution plan in Spark 2.2 with some actions on a dataframe. The goal here is to ensure that partition pruning is occurring as expected before I kick off the job and consume cluster resources. I tried a Spark documentation search and a SO search here but couldn't find a syntax that worked for my situation.

Here is a simple example that works as expected:

scala> List(1, 2, 3, 4).toDF.explain
== Physical Plan ==
LocalTableScan [value#42]

Here's an example that is not working as expected but hoping to get to work:

scala> List(1, 2, 3, 4).toDF.count.explain
<console>:24: error: value explain is not a member of Long
List(1, 2, 3, 4).toDF.count.explain
                               ^

And here's a more detailed example to further exhibit the end goal of the partition pruning that I am hoping to confirm via explain plan.

val newDf = spark.read.parquet(df).filter(s"start >= ${startDt}").filter(s"start <= ${endDt}")

Thanks in advance for any thoughts/feedback.


回答1:


count method is eagerly evaluated and as you see returns Long, so there is no execution plan available.

You have to use a lazy transformation, either:

import org.apache.spark.sql.functions.count

df.select(count($"*"))

or

df.groupBy().agg(count($"*"))


来源:https://stackoverflow.com/questions/50575490/spark-2-x-how-to-generate-simple-explain-execution-plan

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!