Spark: subtract two DataFrames

江枫思渺然 提交于 2019-11-27 03:43:53

According to the api docs, doing:

dataFrame1.except(dataFrame2)

will return a new DataFrame containing rows in dataFrame1 but not in dataframe2.

In pyspark DOCS it would be subtract

df1.subtract(df2)

I tried subtract, but the result was not consistent. If I run df1.subtract(df2), not all lines of df1 are shown on the result dataframe, probably due distinct cited on the docs.

This solved my problem: df1.exceptAll(df2)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!