How to pass whole Row to UDF - Spark DataFrame filter

后端 未结 2 1567
遇见更好的自我
遇见更好的自我 2020-11-30 08:28

I\'m writing filter function for complex JSON dataset with lot\'s of inner structures. Passing individual columns is too cumbersome.

So I declared the following UDF

2条回答
  •  伪装坚强ぢ
    2020-11-30 09:28

    You have to use struct() function for constructing the row while making a call to the function, follow these steps.

    Import Row,

    import org.apache.spark.sql._
    

    Define the UDF

    def myFilterFunction(r:Row) = {r.get(0)==r.get(1)} 
    

    Register the UDF

    sqlContext.udf.register("myFilterFunction", myFilterFunction _)
    

    Create the dataFrame

    val records = sqlContext.createDataFrame(Seq(("sachin", "sachin"), ("aggarwal", "aggarwal1"))).toDF("text", "text2")
    

    Use the UDF

    records.filter(callUdf("myFilterFunction",struct($"text",$"text2"))).show
    

    When u want all columns to be passed to UDF.

    records.filter(callUdf("myFilterFunction",struct(records.columns.map(records(_)) : _*))).show 
    

    Result:

    +------+------+
    |  text| text2|
    +------+------+
    |sachin|sachin|
    +------+------+
    

提交回复
热议问题