I have a DF in which I have bookingDt and arrivalDt columns. I need to find all the dates between these two dates.
Sample code:>
Well, you can do following.
Create a dataframe with dates only:
dates_df # with all days between first bookingDt and last arrivalDt
and then join those df with between condition:
df.join(dates_df,
on=col('dates_df.dates').between(col('df.bookindDt'), col('dt.arrivalDt'))
.select('df.*', 'dates_df.dates')
It might work even faster then solution with explode, however you need to figure out what is start and end date for this df.
10 years df will have just 3650 records not that many to worry about.