hiveql | 易学教程

How to convert json string datatype column to map datatype column in hive?

阅读更多关于 How to convert json string datatype column to map datatype column in hive?

问题 I need to get all the unique key values from all the rows . Each row has different keys and values Please find the above image of the column. eg: one row looks like {"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED"

Configuring Hive to run in Local Mode

阅读更多关于 Configuring Hive to run in Local Mode

问题 Hi I am trying to run Hive in local mode, I have set the HIVE_OPTS environment variable export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:////<myhomedir>/hivelocal/tmp -hiveconf hive.metastore.warehouse.dir=file:////<myhomedir>/hivelocal/warehouse -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/<myhomedir>/hivelocal/metastore_db;create=true' and connected to hive using hive client when I create the table(name demo ), I still see the table

Hive - Select count(*) not working with Tez with but works with MR

阅读更多关于 Hive - Select count(*) not working with Tez with but works with MR

问题 I have a Hive external table with parquet data. When I run select count(*) from table1 , it fails with Tez. But when execution engine is changed to MR it works. Any idea why it's failing with Tez? I'm getting the following error with Tez: Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380

How to sample for each group in hive?

阅读更多关于 How to sample for each group in hive?

问题 I have a large table in hive that has 1.5 bil+ values. One of the columns is category_id , which has ~20 distinct values. I want to sample the table such that I have 1 mil values for each category. I checked out Random sample table with Hive, but including matching rows and Hive: Creating smaller table from big table and I figured out how to get a random sample from the entire table, but I'm still unable to figure out how to get a sample for each category_id . 回答1: I understand you want to

Better HiveQL syntax to explode a column of structs into a table with one column per struct member?

阅读更多关于 Better HiveQL syntax to explode a column of structs into a table with one column per struct member?

问题 I was looking for an argmax() type function in HiveQL and found an almost undocumented feature in their bug tracker (https://issues.apache.org/jira/browse/HIVE-1128) which does what I want by taking max() of a struct, which finds the maximum based on the first element and returns the whole struct. (Actually, maybe the max() would break ties by looking at subsequent elements? I don't know.) Anyway, if I essentially want to select the whole row that contains the max value of some column, I can

Need to add auto increment column in a table using hive

阅读更多关于 Need to add auto increment column in a table using hive

问题 I have to create a table using hive. But I want to create that table with auto increment column. i have googled but not able to find the exact answer. If Anybody knows the syntax for it . Please share it. thanks in advance. 回答1: You need to use a UDF (user defined function) for it. I have successfully used the UDF in this link http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java Further you can learn the use of UDF in hive by this

Obtain date from timestamp

阅读更多关于 Obtain date from timestamp

问题 I have a date field like this: 2017-03-22 11:09:55 (column name: install_date) I have another date field with date like this: 2017-04-20 (column name: test_date) I would like to obtain only the date field from the above (2017-03-22) so that I can perform a DATEDIFF between install_date and test_date. 回答1: Assuming you are looking for this in Hive , you can use TO_DATE function. TO_DATE('2000-01-01 10:20:30') returns '2000-01-01' NOTE : Input to TO_DATE is a string 来源： https://stackoverflow

SQL - find all instances where two columns are the same

阅读更多关于 SQL - find all instances where two columns are the same

Spark 2: how does it work when SparkSession enableHiveSupport() is invoked

阅读更多关于 Spark 2: how does it work when SparkSession enableHiveSupport() is invoked

问题 My question is rather simple, but somehow I cannot find a clear answer by reading the documentation. I have Spark2 running on a CDH 5.10 cluster. There is also Hive and a metastore. I create a session in my Spark program as follows: SparkSession spark = SparkSession.builder().appName("MyApp").enableHiveSupport().getOrCreate() Suppose I have the following HiveQL query: spark.sql("SELECT someColumn FROM someTable") I would like to know whether: under the hood this query is translated into Hive

How to unnest array with keys to join on afterwards?

阅读更多关于 How to unnest array with keys to join on afterwards?

问题 I have two tables, namely table1 and table2 . table1 is big, whereas table2 is small. Also, I have a UDF function whose interface is defined as below: --table1-- id 1 2 3 --table2-- category a b c d e f g UDF: foo(id: Int): List[String] I intend to call UDF firstly to get the corresponding categories: foo(table1.id) , which will return a WrappedArray, then I want to join every category in table2 to do some more manipulation. The expected result should look like this: --view-- id,category 1,a