udf

Schema for type Any is not supported

只愿长相守 提交于 2019-12-01 09:25:13
I'm trying to create a spark UDF to extract a Map of (key, value) pairs from a User defined case class. The scala function seems to work fine, but when I try to convert that to a UDF in spark2.0, I'm running into the " Schema for type Any is not supported" error. case class myType(c1: String, c2: Int) def getCaseClassParams(cc: Product): Map[String, Any] = { cc .getClass .getDeclaredFields // all field names .map(_.getName) .zip(cc.productIterator.to) // zipped with all values .toMap } But when I try to instantiate a function value as a UDF it results in the following error - val ccUDF = udf{

5、Hive的自定义UDF函数

旧街凉风 提交于 2019-12-01 08:00:51
1、pom.xml引入依赖及打包 <dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.1.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0</version> </dependency> </dependencies> <build> <plugins> <!-- 配置java插件,指定版本 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <encoding>UTF-8</encoding> <source>1.8</source> <target>1.8</target> <showWarnings>true</showWarnings> </configuration> </plugin> </plugins> <

Schema for type Any is not supported

眉间皱痕 提交于 2019-12-01 06:51:54
问题 I'm trying to create a spark UDF to extract a Map of (key, value) pairs from a User defined case class. The scala function seems to work fine, but when I try to convert that to a UDF in spark2.0, I'm running into the " Schema for type Any is not supported" error. case class myType(c1: String, c2: Int) def getCaseClassParams(cc: Product): Map[String, Any] = { cc .getClass .getDeclaredFields // all field names .map(_.getName) .zip(cc.productIterator.to) // zipped with all values .toMap } But

Need to stop UDFs recalculating when unrelated cells deleted

烂漫一生 提交于 2019-12-01 03:10:23
问题 I've noticed that my UDFs recalculate whenever I delete cells. This causes massive delays when deleting entire columns, because the UDF gets called for each and every cell it is used in. So if you're using 1000 UDFS, then deleting a column or cell will call it 1000 times. By way of example, put the following UDF in a module, then call it from the worksheet a bunch of times with =HelloWorld() Function HelloWorld() HelloWorld = "HelloWorld" Debug.Print Now() End Function Then delete a row. If

redis作为mysql的缓存服务器(读写分离)

拜拜、爱过 提交于 2019-11-30 23:05:51
一、redis简介 Redis是一个key-value存储系统。和Memcached类似,为了保证效率,数据都是缓存在内存中。区别的是redis会周期性的把更新的数据写入磁盘或者把修改操作写入追加的记录文件,并且在此基础上实现了master-slave(主从)同步。在部分场合可以对关系数据库起到很好的补充作用。它提供了Java,C/C++( hiredis ),C#,PHP,JavaScript,Perl,Object-C,Python,Ruby等客户端,使用很方便。 二、架构图 大致结构就是读写分离,将mysql中的数据通过触发器同步到redis中 三、安装LNMP环境 (这里为了省事,就是用yum来安装) 1、修改yum源 [root@redis ~]# vim /etc/yum.repos.d/epel.repo #添加这个文件 [epel] name=Extra Packages for Enterprise Linux 6 - $basearch baseurl=http://download.fedoraproject.org/pub/epel/6/$basearch failovermethod=priority enabled=1 gpgcheck=0 [nginx] name=nginx repo baseurl=http://nginx.org/packages

In Spark SQL, how do you register and use a generic UDF?

一笑奈何 提交于 2019-11-30 22:31:59
In my Project, I want to achieve ADD( + ) function, but my parameter maybe LongType , DoubleType , IntType . I use sqlContext.udf.register("add",XXX) , but I don't know how to write XXX , which is to make generic functions. You can create a generic UDF by creating a StructType with struct($"col1", $"col2") that holds your values and have your UDF work off of this. It gets passed into your UDF as a Row object, so you can do something like this: val multiAdd = udf[Double,Row](r => { var n = 0.0 r.toSeq.foreach(n1 => n = n + (n1 match { case l: Long => l.toDouble case i: Int => i.toDouble case d:

Unable to pass pig tuple to python UDF

懵懂的女人 提交于 2019-11-30 21:04:10
问题 I have master.txt which has 10K records, so each line of it will be a tuple & whole of the same needs to be passed to python UDF. Since it has multiple records, so on storing p2preportmap getting following error. Please help Error is as follows: Unable to open iterator for alias p2preportmap. Backend error : org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (010301,MTS,MM), 2nd :(010B06,MTS,TN) (common cause: "JOIN" then "FOREACH

BigQuery User Defined Aggregation Function?

微笑、不失礼 提交于 2019-11-30 20:17:23
I know I can define a User Defined Function in order to perform some custom calculation. I also know I can use the 'out-of-the-box' aggregation functions to reduce a collection of values to a single value when using a GROUP BY clause. Is it possible to define a custom user-defined, Aggregation Function to use with a GROUP BY clause? Turns out that this IS possible (as long as the groups we seek to aggregate are of a reasonable size in memory) with a little bit of 'glue' - namely the ARRAY_AGG function The steps are as follows: Create a UDF with an input parameter of type ARRAY<T> where T is

How to deal with Spark UDF input/output of primitive nullable type

混江龙づ霸主 提交于 2019-11-30 18:54:43
问题 The issues: 1) Spark doesn't call UDF if input is column of primitive type that contains null : inputDF.show() +-----+ | x | +-----+ | null| | 1.0| +-----+ inputDF .withColumn("y", udf { (x: Double) => 2.0 }.apply($"x") // will not be invoked if $"x" == null ) .show() +-----+-----+ | x | y | +-----+-----+ | null| null| | 1.0| 2.0| +-----+-----+ 2) Can't produce null from UDF as a column of primitive type: udf { (x: String) => null: Double } // compile error 回答1: Accordingly to the docs: Note

In Spark SQL, how do you register and use a generic UDF?

走远了吗. 提交于 2019-11-30 17:39:01
问题 In my Project, I want to achieve ADD( + ) function, but my parameter maybe LongType , DoubleType , IntType . I use sqlContext.udf.register("add",XXX) , but I don't know how to write XXX , which is to make generic functions. 回答1: You can create a generic UDF by creating a StructType with struct($"col1", $"col2") that holds your values and have your UDF work off of this. It gets passed into your UDF as a Row object, so you can do something like this: val multiAdd = udf[Double,Row](r => { var n