pyspark: count distinct over a window
I just tried doing a countDistinct over a window and got this error: AnalysisException: u'Distinct window functions are not supported: count(distinct color#1926) Is there a way to do a distinct count over a window in pyspark? Here's some example code: from pyspark.sql.window import Window from pyspark.sql import functions as F #function to calculate number of seconds from number of days days = lambda i: i * 86400 df = spark.createDataFrame([(17, "2017-03-10T15:27:18+00:00", "orange"), (13, "2017-03-15T12:27:18+00:00", "red"), (25, "2017-03-18T11:27:18+00:00", "red")], ["dollars", "timestampGMT