How to pass whole Row to UDF - Spark DataFrame filter

后端 未结 2 1559
遇见更好的自我
遇见更好的自我 2020-11-30 08:28

I\'m writing filter function for complex JSON dataset with lot\'s of inner structures. Passing individual columns is too cumbersome.

So I declared the following UDF

2条回答
  •  情深已故
    2020-11-30 09:15

    scala> inputDF
    res40: org.apache.spark.sql.DataFrame = [email: string, first_name: string ... 3 more fields]
    
    scala> inputDF.printSchema
    root
     |-- email: string (nullable = true)
     |-- first_name: string (nullable = true)
     |-- gender: string (nullable = true)
     |-- id: long (nullable = true)
     |-- last_name: string (nullable = true)
    

    Now, I would like to filter the rows based on the Gender Field. I can accomplish that by using the .filter($"gender" === "Male") but I would like to do with the .filter(function).

    So, defined my anonymous functions

    val isMaleRow = (r:Row) => {r.getAs("gender") == "Male"}
    
    val isFemaleRow = (r:Row) => { r.getAs("gender") == "Female" }
    
    inputDF.filter(isMaleRow).show()
    
    inputDF.filter(isFemaleRow).show()
    

    I felt the requirement can be done in a better way i.e without declaring as UDF and invoke it.

提交回复
热议问题