Hive

Hive - How to display Hive query results in the Command Line along with column names

回眸只為那壹抹淺笑 提交于 2021-02-08 06:18:22
问题 I am working in Hive for quite a while . Please note that I don't use Hue at all. I use the Hive shell all the time and now I got a weird but useful question. Whenever we execute a query in the Hive shell, we can see the relevant results on screen but we cannot recognise the column names corresponding to the data unless we do a "desc formatted table_name" or any other similar command and scroll up / down the screen to match the results with the table structure. We do this all the time most

Hive - How to display Hive query results in the Command Line along with column names

爱⌒轻易说出口 提交于 2021-02-08 06:17:50
问题 I am working in Hive for quite a while . Please note that I don't use Hue at all. I use the Hive shell all the time and now I got a weird but useful question. Whenever we execute a query in the Hive shell, we can see the relevant results on screen but we cannot recognise the column names corresponding to the data unless we do a "desc formatted table_name" or any other similar command and scroll up / down the screen to match the results with the table structure. We do this all the time most

Hive - Where and OR clause error

情到浓时终转凉″ 提交于 2021-02-08 05:56:07
问题 Hi I am trying to run this query in Hive, but get the error 10249 (Unsupported query expression - only 1 subquery is supported...) select count(*) from ( select * from tableA union all select * from tableB ) a where a.field1 in (select fieldA in tableC) or a.field2 in (select fieldA in tableC) or a.field3 in (select fieldA in tableC); Would anybody know how I can write this so that Hive supports this query (works fine in SQL server) 回答1: Since you do not need fields from tableC , you can use

Hive - Where and OR clause error

守給你的承諾、 提交于 2021-02-08 05:55:48
问题 Hi I am trying to run this query in Hive, but get the error 10249 (Unsupported query expression - only 1 subquery is supported...) select count(*) from ( select * from tableA union all select * from tableB ) a where a.field1 in (select fieldA in tableC) or a.field2 in (select fieldA in tableC) or a.field3 in (select fieldA in tableC); Would anybody know how I can write this so that Hive supports this query (works fine in SQL server) 回答1: Since you do not need fields from tableC , you can use

How to sort an array and return the index in hive?

Deadly 提交于 2021-02-08 05:28:51
问题 In hive, I wish to sort an array from largest to smallest, and get the index array. For example, the table is like this: id | value_array 1 | {30, 40, 10, 20} 2 | {10, 30, 40, 20} I with to get this: id | value_array 1 | {1, 0, 3, 2} 2 | {2, 1, 3, 0} The arries in result are the index of the initial elements. How can I achieve this? 回答1: Explode array using posexplode to get index and value, sort by value, collect array of index: select id, collect_list(pos) as result_array from ( select s.id

Hive高阶聚合函数 GROUPING SETS、Cube、Rollup

三世轮回 提交于 2021-02-08 05:24:51
-- GROUPING SETS作为GROUP BY的子句,允许开发人员在GROUP BY语句后面指定多个统计选项,可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来。 select device_id ,os_id ,app_id , count ( user_id ) from test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id),(os_id),(device_id,os_id),()) -- 等价于 SELECT device_id, null , null , count ( user_id ) FROM test_xinyan_reg group by device_id UNION ALL SELECT null ,os_id, null , count ( user_id ) FROM test_xinyan_reg group by os_id UNION ALL SELECT device_id,os_id, null , count ( user_id ) FROM test_xinyan_reg group by device_id,os_id UNION ALL SELECT null , null , null , count

Cannot create Hive external table using jdbcStorageHandler

天涯浪子 提交于 2021-02-08 04:45:30
问题 I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code: CREATE EXTERNAL TABLE hive_table ( col1 int, col2 string, col3 date ) STORED BY 'org.apache.hive.storage.jdbc

Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

时间秒杀一切 提交于 2021-02-07 20:30:26
问题 I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention: file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a40-4333-8405-8451faa44319- c000.snappy.parquet 回答1: You should use "Talk is cheap, show me the code." methodology. Everything is not documented and one way to go is just the code. Consider part-1-2_3-4.parquet : Split/Partition number. Random UUID to prevent collision between different (appending)

Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

倾然丶 夕夏残阳落幕 提交于 2021-02-07 20:30:05
问题 I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention: file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a40-4333-8405-8451faa44319- c000.snappy.parquet 回答1: You should use "Talk is cheap, show me the code." methodology. Everything is not documented and one way to go is just the code. Consider part-1-2_3-4.parquet : Split/Partition number. Random UUID to prevent collision between different (appending)

Hive/Impala performance with string partition key vs Integer partition key

梦想的初衷 提交于 2021-02-07 19:54:36
问题 Are numeric columns recommended for partition keys? Will there be any performance difference when we do a select query on numeric column partitions vs string column partitions? 回答1: No, there is no such recommendation. Consider this: The thing is that partition representation in Hive is a folder with a name like 'key=value' or it can be just 'value' but anyway it is string folder name. So it is being stored as string and is being cast during read/write. Partition key value is not packed