get all the dates between two dates in Spark DataFrame

后端 未结 4 1313
臣服心动
臣服心动 2020-12-06 17:49

I have a DF in which I have bookingDt and arrivalDt columns. I need to find all the dates between these two dates.

Sample code:

4条回答
  •  自闭症患者
    2020-12-06 18:25

    Well, you can do following.

    Create a dataframe with dates only:

    dates_df # with all days between first bookingDt and last arrivalDt

    and then join those df with between condition:

    df.join(dates_df, 
      on=col('dates_df.dates').between(col('df.bookindDt'), col('dt.arrivalDt'))
    .select('df.*', 'dates_df.dates')
    

    It might work even faster then solution with explode, however you need to figure out what is start and end date for this df. 10 years df will have just 3650 records not that many to worry about.

提交回复
热议问题