I need to compare two dataframes for type validation and send a nonzero value as output

丶灬走出姿态 提交于 2019-12-02 13:07:46

you can join the two dataframes and then compare the two columns corressponding to the columns type via a Map and UDF. This is a code sample that does that. You need to complete the map with the right values

 val sqlCtx = sqlContext
import sqlCtx.implicits._


val metadata: DataFrame= Seq(
  (Some("1"), "DATETIME", "Num", "8", "DATETIME20", "DATETIME20"),
  (Some("3"), "SOURCEBANK", "Num", "1", "null", "null")
).toDF("SNo", "Variable", "Type", "Len", "Format", "Informat")

val metadataAdapted: DataFrame = metadata
  .withColumn("Name", functions.upper(col("Variable")))
  .withColumnRenamed("Type", "TypeHive")
val sasDF = Seq(("datetime", "TimestampType"),
  ("datetime", "TimestampType")
).toDF("variable", "type")
val sasDFAdapted = sasDF
  .withColumn("Name", functions.upper(col("variable")))
  .withColumnRenamed("Type", "TypeSaS")

val res = sasDFAdapted.join(metadataAdapted, Seq("Name"), "inner")

val map = Map("TimestampType" -> "Num")
 def udfType(dict: Map[String, String]) = functions.udf( (typeVar: String) => dict(typeVar))
val result = res.withColumn("correctMapping", udfType(map)(col("TypeSaS")) === col("TypeHive"))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!