How to calculate the current row with the next one?

前提是你 提交于 2019-12-23 22:20:06

问题


In Spark-Sql version 1.6, using DataFrames, is there a way to calculate, for a specific column, the sum of the current row and the next one, for every row?

For example, if I have a table with one column, like so

Age
12
23
31
67

I'd like the following output

Sum
35
54
98

The last row is dropped because it has no "next row" to be added to.

Right now I am doing it by ranking the table and joining it with itself, where the rank is equals to rank+1.

Is there a better way to do this? Can this be done with a Window function?


回答1:


Yes definitely you can do with Window function by using rowsBetween function. I have used person column for grouping purpose in my following example.

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val dataframe = Seq(
  ("A",12),
  ("A",23),
  ("A",31),
  ("A",67)
).toDF("person", "Age")

val windowSpec = Window.partitionBy("person").orderBy("Age").rowsBetween(0, 1)
val newDF = dataframe.withColumn("sum", sum(dataframe("Age")) over(windowSpec))
  newDF.filter(!(newDF("Age") === newDF("sum"))).show


来源:https://stackoverflow.com/questions/44390004/how-to-calculate-the-current-row-with-the-next-one

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!