I\'m just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset[Row]
A DataFrame is equivalent to a table in RDBMS and can also be manipulated in similar ways to the "native" distributed collections in RDDs. Unlike RDDs, Dataframes keep track of the schema and support various relational operations that lead to more optimized execution. Each DataFrame object represents a logical plan but because of their "lazy" nature no execution occurs until the user calls a specific "output operation".