Scala UDF returning 'Schema for type Unit is not supported'

北城余情 提交于 2019-12-02 10:16:25

Your for loop returns a Unit hence the error message. You could use for-yield to return values, but since the Seq should be updated successively, a simple foldLeft would work better:

import org.apache.spark.sql.functions._

val df = Seq(
  (Seq(101L, 102L), Seq("1", "2"), Seq(11, 12)),
  (Seq(201L, 202L, 203L), Seq("2", "3"), Seq(21, 22, 23))
).toDF("C1", "C2", "C3")
// +---------------+------+------------+
// |C1             |C2    |C3          |
// +---------------+------+------------+
// |[101, 102]     |[1, 2]|[11, 12]    |
// |[201, 202, 203]|[2, 3]|[21, 22, 23]|
// +---------------+------+------------+

def updateC3 = udf( (c1: Seq[Long], c2: Seq[String], c3: Seq[Int]) =>
  c2.foldLeft( c3 ){ (acc, i) =>
    val idx = i.toInt - 1
    acc.updated(idx, c1(idx).toInt)
  }
)

df.withColumn("C3", updateC3($"C1", $"C2", $"C3")).show(false)
// +---------------+------+--------------+
// |C1             |C2    |C3            |
// +---------------+------+--------------+
// |[101, 102]     |[1, 2]|[101, 102]    |
// |[201, 202, 203]|[2, 3]|[21, 202, 203]|
// +---------------+------+--------------+
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!