How to pass whole Row to UDF - Spark DataFrame filter

后端未结

关注

 2  1559

遇见更好的自我 2020-11-30 08:28

I\'m writing filter function for complex JSON dataset with lot\'s of inner structures. Passing individual columns is too cumbersome.

So I declared the following UDF

2条回答

情深已故 (楼主)

2020-11-30 09:15

scala> inputDF
res40: org.apache.spark.sql.DataFrame = [email: string, first_name: string ... 3 more fields]

scala> inputDF.printSchema
root
 |-- email: string (nullable = true)
 |-- first_name: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- id: long (nullable = true)
 |-- last_name: string (nullable = true)

Now, I would like to filter the rows based on the Gender Field. I can accomplish that by using the .filter($"gender" === "Male") but I would like to do with the .filter(function).

So, defined my anonymous functions

val isMaleRow = (r:Row) => {r.getAs("gender") == "Male"}

val isFemaleRow = (r:Row) => { r.getAs("gender") == "Female" }

inputDF.filter(isMaleRow).show()

inputDF.filter(isFemaleRow).show()

I felt the requirement can be done in a better way i.e without declaring as UDF and invoke it.

0 讨论(0)

查看其它2个回答