hiveql

How to convert json string datatype column to map datatype column in hive?

时光怂恿深爱的人放手 提交于 2019-12-11 04:12:31
问题 I need to get all the unique key values from all the rows . Each row has different keys and values Please find the above image of the column. eg: one row looks like {"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED"

Configuring Hive to run in Local Mode

自古美人都是妖i 提交于 2019-12-11 04:06:30
问题 Hi I am trying to run Hive in local mode, I have set the HIVE_OPTS environment variable export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:////<myhomedir>/hivelocal/tmp -hiveconf hive.metastore.warehouse.dir=file:////<myhomedir>/hivelocal/warehouse -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/<myhomedir>/hivelocal/metastore_db;create=true' and connected to hive using hive client when I create the table(name demo ), I still see the table

Hive - Select count(*) not working with Tez with but works with MR

﹥>﹥吖頭↗ 提交于 2019-12-11 02:22:42
问题 I have a Hive external table with parquet data. When I run select count(*) from table1 , it fails with Tez. But when execution engine is changed to MR it works. Any idea why it's failing with Tez? I'm getting the following error with Tez: Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380

How to sample for each group in hive?

孤者浪人 提交于 2019-12-11 00:55:03
问题 I have a large table in hive that has 1.5 bil+ values. One of the columns is category_id , which has ~20 distinct values. I want to sample the table such that I have 1 mil values for each category. I checked out Random sample table with Hive, but including matching rows and Hive: Creating smaller table from big table and I figured out how to get a random sample from the entire table, but I'm still unable to figure out how to get a sample for each category_id . 回答1: I understand you want to

Better HiveQL syntax to explode a column of structs into a table with one column per struct member?

寵の児 提交于 2019-12-10 23:19:26
问题 I was looking for an argmax() type function in HiveQL and found an almost undocumented feature in their bug tracker (https://issues.apache.org/jira/browse/HIVE-1128) which does what I want by taking max() of a struct, which finds the maximum based on the first element and returns the whole struct. (Actually, maybe the max() would break ties by looking at subsequent elements? I don't know.) Anyway, if I essentially want to select the whole row that contains the max value of some column, I can

Need to add auto increment column in a table using hive

非 Y 不嫁゛ 提交于 2019-12-10 19:54:31
问题 I have to create a table using hive. But I want to create that table with auto increment column. i have googled but not able to find the exact answer. If Anybody knows the syntax for it . Please share it. thanks in advance. 回答1: You need to use a UDF (user defined function) for it. I have successfully used the UDF in this link http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java Further you can learn the use of UDF in hive by this

Obtain date from timestamp

♀尐吖头ヾ 提交于 2019-12-10 17:48:18
问题 I have a date field like this: 2017-03-22 11:09:55 (column name: install_date) I have another date field with date like this: 2017-04-20 (column name: test_date) I would like to obtain only the date field from the above (2017-03-22) so that I can perform a DATEDIFF between install_date and test_date. 回答1: Assuming you are looking for this in Hive , you can use TO_DATE function. TO_DATE('2000-01-01 10:20:30') returns '2000-01-01' NOTE : Input to TO_DATE is a string 来源: https://stackoverflow

SQL - find all instances where two columns are the same

社会主义新天地 提交于 2019-12-10 17:33:27
问题 So I have a simple table that holds comments from a user that pertain to a specific blog post . id | user | post_id | comment ---------------------------------------------------------- 0 | john@test.com | 1001 | great article 1 | bob@test.com | 1001 | nice post 2 | john@test.com | 1002 | I agree 3 | john@test.com | 1001 | thats cool 4 | bob@test.com | 1002 | thanks for sharing 5 | bob@test.com | 1002 | really helpful 6 | steve@test.com | 1001 | spam post about pills I want to get all

Spark 2: how does it work when SparkSession enableHiveSupport() is invoked

时间秒杀一切 提交于 2019-12-10 15:47:56
问题 My question is rather simple, but somehow I cannot find a clear answer by reading the documentation. I have Spark2 running on a CDH 5.10 cluster. There is also Hive and a metastore. I create a session in my Spark program as follows: SparkSession spark = SparkSession.builder().appName("MyApp").enableHiveSupport().getOrCreate() Suppose I have the following HiveQL query: spark.sql("SELECT someColumn FROM someTable") I would like to know whether: under the hood this query is translated into Hive

How to unnest array with keys to join on afterwards?

风格不统一 提交于 2019-12-10 15:15:09
问题 I have two tables, namely table1 and table2 . table1 is big, whereas table2 is small. Also, I have a UDF function whose interface is defined as below: --table1-- id 1 2 3 --table2-- category a b c d e f g UDF: foo(id: Int): List[String] I intend to call UDF firstly to get the corresponding categories: foo(table1.id) , which will return a WrappedArray, then I want to join every category in table2 to do some more manipulation. The expected result should look like this: --view-- id,category 1,a