I created a dataframe in Spark, by groupby column1 and date and calculated the amount.
val table = df1.groupBy($\"column1\",$\"date\").sum(\"amount\")
Assumming those two dates belong to each group of your table
my imports :
import org.apache.spark.sql.functions.{concat_ws,collect_list,lit}
Perpare the dataframe
scala> val seqRow = Seq(
| ("A","1- jul",1000),
| ("A","1-june",2000),
| ("A","1-May",2000),
| ("A","1-dec",3000),
| ("B","1-jul",100),
| ("B","1-june",300),
| ("B","1-May",400),
| ("B","1-dec",300))
seqRow: Seq[(String, String, Int)] = List((A,1- jul,1000), (A,1-june,2000), (A,1-May,2000), (A,1-dec,3000), (B,1-jul,100), (B,1-june,300), (B,1-May,400), (B,1-dec,300))
scala> val input_df = sc.parallelize(seqRow).toDF("column1","date","amount")
input_df: org.apache.spark.sql.DataFrame = [column1: string, date: string ... 1 more field]
Now write a UDF for your case,
scala> def calc_diff = udf((list : Seq[String],startMonth : String,endMonth : String) => {
| //get the month and their values
| val monthMap = list.map{str =>
| val splitText = str.split("\\$")
| val month = splitText(0).split("-")(1).trim
|
| (month.toLowerCase,splitText(1).toInt)
| }.toMap
|
| val stMnth = monthMap(startMonth)
| val endMnth = monthMap(endMonth)
| endMnth - stMnth
|
| })
calc_diff: org.apache.spark.sql.expressions.UserDefinedFunction
Now, Preparing the output
scala> val (month1 : String,month2 : String) = ("jul","dec")
month1: String = jul
month2: String = dec
scala> val req_df = group_df.withColumn("diff",calc_diff('collect_val,lit(month1.toLowerCase),lit(month2.toLowerCase)))
req_df: org.apache.spark.sql.DataFrame = [column1: string, sum_amount: bigint ... 2 more fields]
scala> val req_df = group_df.withColumn("diff",calc_diff('collect_val,lit(month1.toLowerCase),lit(month2.toLowerCase))).drop('collect_val)
req_df: org.apache.spark.sql.DataFrame = [column1: string, sum_amount: bigint ... 1 more field]
scala> req_df.orderBy('column1).show
+-------+----------+----+
|column1|sum_amount|diff|
+-------+----------+----+
| A| 8000|2000|
| B| 1100| 200|
+-------+----------+----+
Hope, this is what you want.