Filter Spark DataFrame based on another DataFrame that specifies denylist criteria

后端 未结 2 777
南方客
南方客 2020-12-01 05:25

I have a largeDataFrame (multiple columns and billions of rows) and a smallDataFrame (single column and 10,000 rows).

I\'d like to filter a

2条回答
  •  臣服心动
    2020-12-01 05:48

    You'll need to use a left_anti join in this case.

    The left anti join is the opposite of a left semi join.

    It filters out data from the right table in the left table according to a given key :

    largeDataFrame
       .join(smallDataFrame, Seq("some_identifier"),"left_anti")
       .show
    // +---------------+----------+
    // |some_identifier|first_name|
    // +---------------+----------+
    // |            222|      mary|
    // |            111|       bob|
    // +---------------+----------+
    

提交回复
热议问题