I have this spark DataFrame:
+---+-----+------+----+------------+------------+
| ID| ID2|Number|Name|Opening_Ho
Here is a way to do it without Window.
A DataFrame with the duplicates
df.exceptAll(df.drop_duplicates(['ID', 'ID2', 'Number'])).show()
# +---+---+------+------------+------------+
# | ID|ID2|Number|Opening_Hour|Closing_Hour|
# +---+---+------+------------+------------+
# |ALT|QWA| 2| 08:53:00| 23:24:00|
# |ALT|QWA| 6| 08:55:00| 23:26:00|
# +---+---+------+------------+------------+
A DataFrame with all duplicates (using left_anti join)
df.join(df.groupBy('ID', 'ID2', 'Number')\
.count().where('count = 1').drop('count'),
on=['ID', 'ID2', 'Number'],
how='left_anti').show()
# +---+---+------+------------+------------+
# | ID|ID2|Number|Opening_Hour|Closing_Hour|
# +---+---+------+------------+------------+
# |ALT|QWA| 2| 08:54:00| 23:25:00|
# |ALT|QWA| 2| 08:53:00| 23:24:00|
# |ALT|QWA| 6| 08:59:00| 23:30:00|
# |ALT|QWA| 6| 08:55:00| 23:26:00|
# +---+---+------+------------+------------+