问题
Below is my java udf code,
package com.udf;
import org.apache.spark.sql.api.java.UDF1;
public class SparkUDF implements UDF1<String, String> {
@Override
public String call(String arg) throws Exception {
if (validateString(arg))
return arg;
return "INVALID";
}
public static boolean validateString(String arg) {
if (arg == null | arg.length() != 11)
return false;
else
return true;
}
}
I am building the Jar with this class as SparkUdf-1.0-SNAPSHOT.jar
I am having a table name as sample in hive and wanted to run below sql on spark shell.
> select UDF(name) from sample ;
Starting the spark-shell with below command.
spark-shell --jars SparkUdf-1.0-SNAPSHOT.jar
Can anyone tell, how to register the UDF on spark shell to use it in spark sql ?
回答1:
After some more searches , I got the answer,
Below are the steps,
spark-shell --jars SparkUdf-1.0-SNAPSHOT.jar
scala> import com.udf.SparkUDF;
scala> import com.udf.SparkUDF;
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};
scala> spark.udf.register("myfunc", new SparkUDF(),StringType)
scala> val sql1 = """ select myfunc(name) from sample """
scala> spark.sql(sql1).show();
You will get the results.
来源:https://stackoverflow.com/questions/54771895/how-to-register-the-java-spark-udf-in-spark-shell