I am new to spark/scala. I am trying to read some data from a hive table to a spark dataframe and then add a column based on some condition. Here is my code:
You can simply use datediff inbuilt function to check for the days difference between two columns. you don't need to write your function or udf function. And when function is also modified than yours
import org.apache.spark.sql.functions._
val finalDF = DF.withColumn("status",
when(col("past_due").equalTo(1) && col("item_due_date").isNotNull && !(lower(col("item_due_date")).equalTo("null")) && (datediff(col("partition_date"),col("item_due_date")) < 0) && col("item_decision").isNotNull && !(lower(col("item_decision")).equalTo("null")), "approved")
.otherwise(when(col("past_due").equalTo(1) && col("item_due_date").isNotNull && !(lower(col("item_due_date")).equalTo("null")) && (datediff(col("partition_date"),col("item_due_date")) < 0) && (col("item_decision").isNull || lower(col("item_decision")).equalTo("null")), "pending")
.otherwise(when(col("past_due").equalTo(1) && col("item_due_date").isNotNull && !(lower(col("item_due_date")).equalTo("null")) && (datediff(col("partition_date"),col("item_due_date")) >= 0), "expired")
.otherwise("null"))))
This logic will convert the dataframe
+--------+-------------+-------------+--------------+
|past_due|item_due_date|item_decision|partition_date|
+--------+-------------+-------------+--------------+
|1 |2017-12-14 |null |2017-11-22 |
|1 |2017-12-14 |Mitigate |2017-11-22 |
|1 |0001-01-14 |Mitigate |2017-11-22 |
|1 |0001-01-14 |Mitigate |2017-11-22 |
|0 |2018-03-18 |null |2017-11-22 |
|1 |2016-11-30 |null |2017-11-22 |
+--------+-------------+-------------+--------------+
with addition of status column as
+--------+-------------+-------------+--------------+--------+
|past_due|item_due_date|item_decision|partition_date|status |
+--------+-------------+-------------+--------------+--------+
|1 |2017-12-14 |null |2017-11-22 |pending |
|1 |2017-12-14 |Mitigate |2017-11-22 |approved|
|1 |0001-01-14 |Mitigate |2017-11-22 |expired |
|1 |0001-01-14 |Mitigate |2017-11-22 |expired |
|0 |2018-03-18 |null |2017-11-22 |null |
|1 |2016-11-30 |null |2017-11-22 |expired |
+--------+-------------+-------------+--------------+--------+
I hope the answer is helpful