Definition says:
RDD is immutable distributed collection of objects
I don\'t quite understand what does it mean. Is it like da
RDD (Resilient Distributed Datasets) are an abstraction for representing data. Formally they are a read-only, partitioned collection of records that provides a convenient API.
RDD provide a performant solution for processing large datasets on cluster computing frameworks such as MapReduce by addressing some key issues:
RDD's have two main limitations:
One nice conceptual advantage of RDD's is that they pack together data and code making it easier to reuse data pipelines.
Sources: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, An Architecture for Fast and General Data Processing on Large Clusters