hive-udf | 易学教程

Find median in spark SQL for multiple double datatype columns

阅读更多关于 Find median in spark SQL for multiple double datatype columns

问题 I have a requirement to find median for multiple double datatype columns.Request suggestion to find the correct approach. Below is my sample dataset with one column. I am expecting the median value to be returned as 1 for my sample. scala> sqlContext.sql("select num from test").show(); +---+ |num| +---+ |0.0| |0.0| |1.0| |1.0| |1.0| |1.0| +---+ I tried the following options 1) Hive UDAF percentile, it worked only for BigInt. 2) Hive UDAT percentile_approx, but it does not work as expected

How to convert a Date String from UTC to Specific TimeZone in HIVE?

阅读更多关于 How to convert a Date String from UTC to Specific TimeZone in HIVE?

问题 My Hive table has a date column with UTC date strings. I want to get all rows for a specific EST date. I am trying to do something like the below: Select * from TableName T where TO_DATE(ConvertToESTTimeZone(T.date)) = "2014-01-12" I want to know if there is a function for ConvertToESTTimeZone, or how I can achieve that? I tried the following but it doesnt work (my default timezone is CST): TO_DATE(from_utc_timestamp(T.Date) = "2014-01-12" TO_DATE( from_utc_timestamp(to_utc_timestamp (unix

Hive gives SemanticException [Error 10014]: when Running my UDF

阅读更多关于 Hive gives SemanticException [Error 10014]: when Running my UDF

问题 I have a hive UDF that does a GeoIP lookup. public static Text evaluate(Text inputFieldName, Text option, Text databaseFileName) { String inputField, fieldOption, dbFileName, result = null; inputField = inputFieldName.toString(); fieldOption = option.toString(); dbFileName = databaseFileName.toString(); ExtractData eed = new ExtractData(); try { result = eed.ExtractDB(inputField, fieldOption, dbFileName); } catch (IOException e) { e.printStackTrace(); } catch (GeoIp2Exception e) { e

How to create view for struct fields in hive

阅读更多关于 How to create view for struct fields in hive

问题 STEP 1: I have written an UDF which will form 2 or more Struct columns like cars, bikes, buses. Also the UDF takes some info from other view called 'details'. cars struct form is: ARRAY<STRUCT<name:string, mfg:string, year:int>> bikes struct form is: ARRAY<STRUCT<name: string, mfg:string, year: int, price: double>> buses struct form is: ARRAY<STRUCT<name: string, mfg:string, year: int, price: double>> I am creating a view 'vehicles' using this UDF as below ADD JAR s3://test/StructFV-0.1.jar;

HiveUDF + saxon 9.1.0.8 + Java8 = failed to create an XPathFactory

阅读更多关于 HiveUDF + saxon 9.1.0.8 + Java8 = failed to create an XPathFactory

问题 My Spark job with HiveContext and Saxon working fine unless no UDFs defined in code. In case of UDF implementation - HiveContext initialization failed with error. I heard there are saxon\java8 incompability solved in saxon 9.5.1.5, which is not released yet as free version in central maven repository: Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the

HiveUDF + saxon 9.1.0.8 + Java8 = failed to create an XPathFactory

阅读更多关于 HiveUDF + saxon 9.1.0.8 + Java8 = failed to create an XPathFactory

My Spark job with HiveContext and Saxon working fine unless no UDFs defined in code. In case of UDF implementation - HiveContext initialization failed with error. I heard there are saxon\java8 incompability solved in saxon 9.5.1.5, which is not released yet as free version in central maven repository: Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.xpath.XPathFactoryConfigurationException: java.util.ServiceConfigurationError:

Find median in spark SQL for multiple double datatype columns

阅读更多关于 Find median in spark SQL for multiple double datatype columns

I have a requirement to find median for multiple double datatype columns.Request suggestion to find the correct approach. Below is my sample dataset with one column. I am expecting the median value to be returned as 1 for my sample. scala> sqlContext.sql("select num from test").show(); +---+ |num| +---+ |0.0| |0.0| |1.0| |1.0| |1.0| |1.0| +---+ I tried the following options 1) Hive UDAF percentile, it worked only for BigInt. 2) Hive UDAT percentile_approx, but it does not work as expected (returns 0.25 vs 1). sqlContext.sql("select percentile_approx(num,0.5) from test").show(); +----+ | _c0| +

How to convert a Date String from UTC to Specific TimeZone in HIVE?

阅读更多关于 How to convert a Date String from UTC to Specific TimeZone in HIVE?

My Hive table has a date column with UTC date strings. I want to get all rows for a specific EST date. I am trying to do something like the below: Select * from TableName T where TO_DATE(ConvertToESTTimeZone(T.date)) = "2014-01-12" I want to know if there is a function for ConvertToESTTimeZone, or how I can achieve that? I tried the following but it doesnt work (my default timezone is CST): TO_DATE(from_utc_timestamp(T.Date) = "2014-01-12" TO_DATE( from_utc_timestamp(to_utc_timestamp (unix_timestamp (T.date), 'CST'),'EST')) Thanks in advance. Update: Strange behavior. When I do this: select