发表新帖

发表新帖

Spark + Scala transformations, immutability & memory consumption overheads

前端未结

关注

 2  607

星月不相逢 2020-12-09 06:40

I have gone through some videos in Youtube regarding Spark architecture.

Even though Lazy evaluation, Resilience of data creation in case of failures, good functiona

2条回答

误落风尘 (楼主)

2020-12-09 07:25

The memory requirements of Spark not 10 times if you have 10 transformations in your Spark job. When you specify the steps of transformations in a job Spark builds a DAG which will allow it to execute all the steps in the jobs. After that it breaks the job down into stages. A stage is a sequence of transformations which Spark can execute on dataset without shuffling.

When an action is triggered on the RDD, Spark evaluates the DAG. It just applies all the transformations in a stage together until it hits the end of the stage, so it is unlikely for the memory pressure to be 10 time unless each transformation leads to a shuffle (in which case it is probably a badly written job).

I would recommend watching this talk and going through the slides.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题