Spark Transformation - Why its lazy and what is the advantage?

后端 未结 3 945
我在风中等你
我在风中等你 2020-11-27 06:29

Spark Transformations are lazily evaluated - when we call the action it executes all the transformations based on lineage graph.

What is the advantage o

3条回答
  •  广开言路
    2020-11-27 06:49

    Consider a 1 GB log file where you have error,warning and info messages and it is present in HDFS as blocks of 64 or 128 MB(doesn't matter in this context).You first create a RDD called "input" of this text file. Then,you create another RDD called "errors" by applying filter on the "input" RDD to fetch only the lines containing error messages and then call the action first() on the "error" RDD. Spark will here optimize the processing of the log file by stopping as soon as it finds the first occurrence of an error message in any of the partitions. If the same scenario had been repeated in eager evaluation, Spark would have filtered all the partitions of the log file even though you were only interested in the first error message.

提交回复
热议问题