What is RDD in spark

后端 未结 9 1534
傲寒
傲寒 2020-12-12 19:20

Definition says:

RDD is immutable distributed collection of objects

I don\'t quite understand what does it mean. Is it like da

9条回答
  •  感动是毒
    2020-12-12 19:48

    RDD is a logical reference of a dataset which is partitioned across many server machines in the cluster. RDDs are Immutable and are self recovered in case of failure.

    dataset could be the data loaded externally by the user. It could be a json file, csv file or a text file with no specific data structure.

    UPDATE: Here is the paper what describe RDD internals:

    Hope this helps.

提交回复
热议问题