How to refer broadcast variable in dataframes

二次信任 提交于 2019-12-04 17:12:59

How do I refer the broadcast variable in the above combinedDF statement?

Use udf. If emp_id is Int

val f = udf((emp_id: Int) =>  bc.value.get(emp_id))

df_dept.withColumn("emp_name", f($"emp_id"))

How to handle if the lkp doesn't return any value?

Don't use get as shown above

Is there a way to return multiple records from the lkp

Use groupByKey:

val lkp = rdd.groupByKey.collectAsMap()

and explode:

df_dept.withColumn("emp_name", f($"emp_id")).withColumn("emp_name", explode($"emp_name"))

or just skip all the steps and broadcast:

import org.apache.spark.sql.functions._

df_emp.join(broadcast(df_dep), Seq("Emp Id"), "left")
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!