How to use different window specification per column values?

问题

This is my partitionBy condition which i need to change based on the column value from the data frame .

val windowSpec = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)

Now if the value of the one of the column (col6) in data frame is I then above condition .

But when the value of the column(col6) changes O then below condition

val windowSpec = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)

How can i implement it in the spark data frame .

So it is like for each record it will check whether col6 is I or O based on that partitionBy condition will be applied

回答1:

Given the requirement to select the final window specification based on the values of col6 column, I'd do filter first followed by the final window aggregation.

scala> dataset.show
+----+----+----+----+----+
|col1|col2|col3|col5|col6|
+----+----+----+----+----+
|   0|   0|   0|   0|   I| // <-- triggers 3 columns to use
|   0|   0|   0|   0|   O| // <-- the aggregation should use just 2 columns
+----+----+----+----+----+

With the above dataset, I'd filter out to see if there's at least one I in col6 and apply the window specification.

val windowSpecForIs = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
val windowSpecForOs = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)

val noIs = dataset.filter($"col6" === "I").take(1).isEmpty
val windowSpec = if (noIs) windowSpecForOs else windowSpecForIs

来源：https://stackoverflow.com/questions/47387390/how-to-use-different-window-specification-per-column-values

标签

apache-spark

spark-dataframe

apache-spark-dataset