Spark : need confirmation on approach in capturing first and last date : on dataset
问题 I have a data frame : A, B, C, D, 201701, 2020001 A, B, C, D, 201801, 2020002 A, B, C, D, 201901, 2020003 expected output : col_A, col_B, col_C ,col_D, min_week ,max_week, min_month, max_month A, B, C, D, 201701, 201901, 2020001, 2020003 What I tried in pyspark- from pyspark.sql import Window import pyspark.sql.functions as psf w1 = Window.partitionBy('A','B', 'C', 'D')\ .orderBy('WEEK','MONTH') df_new = df_source\ .withColumn("min_week", psf.first("WEEK").over(w1))\ .withColumn("max_week",