How to register the Java SPark UDF in spark shell?

问题

Below is my java udf code,

package com.udf;

import org.apache.spark.sql.api.java.UDF1;

public class SparkUDF implements UDF1<String, String> {
    @Override
    public String call(String arg) throws Exception {
        if (validateString(arg))
            return arg;
        return "INVALID";
    }

public static boolean validateString(String arg) {
    if (arg == null | arg.length() != 11)
        return false;
    else
        return true;
}
}

I am building the Jar with this class as SparkUdf-1.0-SNAPSHOT.jar

I am having a table name as sample in hive and wanted to run below sql on spark shell.

> select UDF(name) from sample ;

Starting the spark-shell with below command.

spark-shell --jars SparkUdf-1.0-SNAPSHOT.jar

Can anyone tell, how to register the UDF on spark shell to use it in spark sql ?

回答1:

After some more searches , I got the answer,

Below are the steps,

spark-shell --jars SparkUdf-1.0-SNAPSHOT.jar

scala> import com.udf.SparkUDF;
scala> import com.udf.SparkUDF;
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};

scala> spark.udf.register("myfunc", new SparkUDF(),StringType)

scala> val sql1 = """ select myfunc(name) from sample """

scala> spark.sql(sql1).show();

You will get the results.

来源：https://stackoverflow.com/questions/54771895/how-to-register-the-java-spark-udf-in-spark-shell

标签

java

scala

apache-spark

user-defined-functions

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!