I have a dataset consisting of a timestamp column and a dollars column. I would like to find the average number of dollars per week ending at the timestamp of each row. I
It's worth noting, that if you don't care about the exact dates - but care to have the average of the last 30 days available you can use the rowsBetween function as follows:
w = Window.orderBy('timestampGMT').rowsBetween(-7, 0)
df = eurPrices.withColumn('rolling_average', F.avg('dollars').over(w))
Since you order by the dates, it will take the last 7 occurrences. You save all the casting.