Spark + Scala transformations, immutability & memory consumption overheads

前端 未结 2 607
星月不相逢
星月不相逢 2020-12-09 06:40

I have gone through some videos in Youtube regarding Spark architecture.

Even though Lazy evaluation, Resilience of data creation in case of failures, good functiona

2条回答
  •  误落风尘
    2020-12-09 07:25

    The memory requirements of Spark not 10 times if you have 10 transformations in your Spark job. When you specify the steps of transformations in a job Spark builds a DAG which will allow it to execute all the steps in the jobs. After that it breaks the job down into stages. A stage is a sequence of transformations which Spark can execute on dataset without shuffling.

    When an action is triggered on the RDD, Spark evaluates the DAG. It just applies all the transformations in a stage together until it hits the end of the stage, so it is unlikely for the memory pressure to be 10 time unless each transformation leads to a shuffle (in which case it is probably a badly written job).

    I would recommend watching this talk and going through the slides.

提交回复
热议问题