发表新帖

发表新帖

What is RDD in spark

后端未结

关注

 9  1527

傲寒 2020-12-12 19:20

Definition says:

RDD is immutable distributed collection of objects

I don\'t quite understand what does it mean. Is it like da

9条回答

野趣味 (楼主)

2020-12-12 20:00

RDD is a way of representing data in spark.The source of data can be JSON,CSV textfile or some other source. RDD is fault tolerant which means that it stores data on multiple locations(i.e the data is stored in distributed form ) so if a node fails the data can be recovered. In RDD data is available at all times. However RDD are slow and hard to code hence outdated. It has been replaced by concept of DataFrame and Dataset.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题