Spark: subtract two DataFrames
问题 In Spark version 1.2.0 one could use subtract with 2 SchemRDD s to end up with only the different content from the first one val onlyNewData = todaySchemaRDD.subtract(yesterdaySchemaRDD) onlyNewData contains the rows in todaySchemRDD that do not exist in yesterdaySchemaRDD . How can this be achieved with DataFrames in Spark version 1.3.0 ? 回答1: According to the api docs, doing: dataFrame1.except(dataFrame2) will return a new DataFrame containing rows in dataFrame1 but not in dataframe2. 回答2: