hive-udf

Find median in spark SQL for multiple double datatype columns

时间秒杀一切 提交于 2019-12-30 09:26:11
问题 I have a requirement to find median for multiple double datatype columns.Request suggestion to find the correct approach. Below is my sample dataset with one column. I am expecting the median value to be returned as 1 for my sample. scala> sqlContext.sql("select num from test").show(); +---+ |num| +---+ |0.0| |0.0| |1.0| |1.0| |1.0| |1.0| +---+ I tried the following options 1) Hive UDAF percentile, it worked only for BigInt. 2) Hive UDAT percentile_approx, but it does not work as expected

How to convert a Date String from UTC to Specific TimeZone in HIVE?

怎甘沉沦 提交于 2019-12-18 06:06:28
问题 My Hive table has a date column with UTC date strings. I want to get all rows for a specific EST date. I am trying to do something like the below: Select * from TableName T where TO_DATE(ConvertToESTTimeZone(T.date)) = "2014-01-12" I want to know if there is a function for ConvertToESTTimeZone, or how I can achieve that? I tried the following but it doesnt work (my default timezone is CST): TO_DATE(from_utc_timestamp(T.Date) = "2014-01-12" TO_DATE( from_utc_timestamp(to_utc_timestamp (unix

Hive gives SemanticException [Error 10014]: when Running my UDF

强颜欢笑 提交于 2019-12-13 18:39:44
问题 I have a hive UDF that does a GeoIP lookup. public static Text evaluate(Text inputFieldName, Text option, Text databaseFileName) { String inputField, fieldOption, dbFileName, result = null; inputField = inputFieldName.toString(); fieldOption = option.toString(); dbFileName = databaseFileName.toString(); ExtractData eed = new ExtractData(); try { result = eed.ExtractDB(inputField, fieldOption, dbFileName); } catch (IOException e) { e.printStackTrace(); } catch (GeoIp2Exception e) { e

How to create view for struct fields in hive

走远了吗. 提交于 2019-12-12 03:45:52
问题 STEP 1: I have written an UDF which will form 2 or more Struct columns like cars, bikes, buses. Also the UDF takes some info from other view called 'details'. cars struct form is: ARRAY<STRUCT<name:string, mfg:string, year:int>> bikes struct form is: ARRAY<STRUCT<name: string, mfg:string, year: int, price: double>> buses struct form is: ARRAY<STRUCT<name: string, mfg:string, year: int, price: double>> I am creating a view 'vehicles' using this UDF as below ADD JAR s3://test/StructFV-0.1.jar;

HiveUDF + saxon 9.1.0.8 + Java8 = failed to create an XPathFactory

China☆狼群 提交于 2019-12-02 08:46:11
问题 My Spark job with HiveContext and Saxon working fine unless no UDFs defined in code. In case of UDF implementation - HiveContext initialization failed with error. I heard there are saxon\java8 incompability solved in saxon 9.5.1.5, which is not released yet as free version in central maven repository: Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the

HiveUDF + saxon 9.1.0.8 + Java8 = failed to create an XPathFactory

廉价感情. 提交于 2019-12-02 05:52:22
My Spark job with HiveContext and Saxon working fine unless no UDFs defined in code. In case of UDF implementation - HiveContext initialization failed with error. I heard there are saxon\java8 incompability solved in saxon 9.5.1.5, which is not released yet as free version in central maven repository: Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.xpath.XPathFactoryConfigurationException: java.util.ServiceConfigurationError:

Find median in spark SQL for multiple double datatype columns

我的梦境 提交于 2019-12-01 05:13:00
I have a requirement to find median for multiple double datatype columns.Request suggestion to find the correct approach. Below is my sample dataset with one column. I am expecting the median value to be returned as 1 for my sample. scala> sqlContext.sql("select num from test").show(); +---+ |num| +---+ |0.0| |0.0| |1.0| |1.0| |1.0| |1.0| +---+ I tried the following options 1) Hive UDAF percentile, it worked only for BigInt. 2) Hive UDAT percentile_approx, but it does not work as expected (returns 0.25 vs 1). sqlContext.sql("select percentile_approx(num,0.5) from test").show(); +----+ | _c0| +

How to convert a Date String from UTC to Specific TimeZone in HIVE?

好久不见. 提交于 2019-12-01 04:09:39
My Hive table has a date column with UTC date strings. I want to get all rows for a specific EST date. I am trying to do something like the below: Select * from TableName T where TO_DATE(ConvertToESTTimeZone(T.date)) = "2014-01-12" I want to know if there is a function for ConvertToESTTimeZone, or how I can achieve that? I tried the following but it doesnt work (my default timezone is CST): TO_DATE(from_utc_timestamp(T.Date) = "2014-01-12" TO_DATE( from_utc_timestamp(to_utc_timestamp (unix_timestamp (T.date), 'CST'),'EST')) Thanks in advance. Update: Strange behavior. When I do this: select