Apache spark case with multiple when clauses on different columns

依然范特西╮ 提交于 2020-06-08 05:59:07

问题


Given the below structure:

val df = Seq("Color", "Shape", "Range","Size").map(Tuple1.apply).toDF("color")

val df1 = df.withColumn("Success", when($"color"<=> "white", "Diamond").otherwise(0))

I want to write one more WHEN condition at above where size > 10 and Shape column value is Rhombus then "Diamond" value should be inserted to the column else 0. I tried like below but it's failing

val df1 = df.withColumn("Success", when($"color" <=> "white", "Diamond").otherwise(0)).when($"size">10)

Please suggest me with only dataframe option with scala. Spark-SQL with sqlContext is not helpful idea for me.

Thanks !


回答1:


You can chain the when similar to the example in https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html#when-org.apache.spark.sql.Column-java.lang.Object- available since (1.4.0)

// Scala:
people.select(when(people("gender") === "male", 0)
 .when(people("gender") === "female", 1)
 .otherwise(2))

Your example:

val df1 = df.withColumn("Success",
  when($"color" <=> "white", "Diamond")
  .when($"size" > 10 && $"shape" === "Rhombus", "Diamond")
  .otherwise(0))



回答2:


Did you try to make an UDF? Try something like that:

// Define the UDF
val isDiamond= udf((color: String, shape: String, size : String) => {
  if (color == "white" && shape == "Rhombus" && size > 10) "Diamond"
  else ""
})
val df2 = df.withColumn("Success", isDiamond($"color", $"shape", $"size"))

Regards.



来源:https://stackoverflow.com/questions/42349830/apache-spark-case-with-multiple-when-clauses-on-different-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!