问题
This is my partitionBy condition which i need to change based on the column value from the data frame .
val windowSpec = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
Now if the value of the one of the column (col6) in data frame is I then above condition .
But when the value of the column(col6) changes O then below condition
val windowSpec = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)
How can i implement it in the spark data frame .
So it is like for each record it will check whether col6 is I or O based on that partitionBy condition will be applied
回答1:
Given the requirement to select the final window specification based on the values of col6
column, I'd do filter
first followed by the final window aggregation.
scala> dataset.show
+----+----+----+----+----+
|col1|col2|col3|col5|col6|
+----+----+----+----+----+
| 0| 0| 0| 0| I| // <-- triggers 3 columns to use
| 0| 0| 0| 0| O| // <-- the aggregation should use just 2 columns
+----+----+----+----+----+
With the above dataset, I'd filter
out to see if there's at least one I
in col6
and apply the window specification.
val windowSpecForIs = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
val windowSpecForOs = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)
val noIs = dataset.filter($"col6" === "I").take(1).isEmpty
val windowSpec = if (noIs) windowSpecForOs else windowSpecForIs
来源:https://stackoverflow.com/questions/47387390/how-to-use-different-window-specification-per-column-values