I\'m just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset[Row]
A Dataframe is an RDD of Row objects, each representing a record. A Dataframe also knows the schema (i.e., data fields) of its rows. While Dataframes look like regular RDDs, internally they store data in a more efficient manner, taking advantage of their schema. In addition, they provide new operations not available on RDDs, such as the ability to run SQL queries. Dataframes can be created from external data sources, from the results of queries, or from regular RDDs.
Reference: Zaharia M., et al. Learning Spark (O'Reilly, 2015)