Definition says:
RDD is immutable distributed collection of objects
I don\'t quite understand what does it mean. Is it like da
RDD is a logical reference of a dataset which is partitioned across many server machines in the cluster. RDDs are Immutable and are self recovered in case of failure.
dataset could be the data loaded externally by the user. It could be a json file, csv file or a text file with no specific data structure.
UPDATE: Here is the paper what describe RDD internals:
Hope this helps.