Hive | 易学教程

Hive - How to display Hive query results in the Command Line along with column names

阅读更多关于 Hive - How to display Hive query results in the Command Line along with column names

问题 I am working in Hive for quite a while . Please note that I don't use Hue at all. I use the Hive shell all the time and now I got a weird but useful question. Whenever we execute a query in the Hive shell, we can see the relevant results on screen but we cannot recognise the column names corresponding to the data unless we do a "desc formatted table_name" or any other similar command and scroll up / down the screen to match the results with the table structure. We do this all the time most

Hive - How to display Hive query results in the Command Line along with column names

阅读更多关于 Hive - How to display Hive query results in the Command Line along with column names

Hive - Where and OR clause error

阅读更多关于 Hive - Where and OR clause error

问题 Hi I am trying to run this query in Hive, but get the error 10249 (Unsupported query expression - only 1 subquery is supported...) select count(*) from ( select * from tableA union all select * from tableB ) a where a.field1 in (select fieldA in tableC) or a.field2 in (select fieldA in tableC) or a.field3 in (select fieldA in tableC); Would anybody know how I can write this so that Hive supports this query (works fine in SQL server) 回答1: Since you do not need fields from tableC , you can use

Hive - Where and OR clause error

阅读更多关于 Hive - Where and OR clause error

How to sort an array and return the index in hive?

阅读更多关于 How to sort an array and return the index in hive?

问题 In hive, I wish to sort an array from largest to smallest, and get the index array. For example, the table is like this: id | value_array 1 | {30, 40, 10, 20} 2 | {10, 30, 40, 20} I with to get this: id | value_array 1 | {1, 0, 3, 2} 2 | {2, 1, 3, 0} The arries in result are the index of the initial elements. How can I achieve this? 回答1: Explode array using posexplode to get index and value, sort by value, collect array of index: select id, collect_list(pos) as result_array from ( select s.id

Hive高阶聚合函数 GROUPING SETS、Cube、Rollup

阅读更多关于 Hive高阶聚合函数 GROUPING SETS、Cube、Rollup

-- GROUPING SETS作为GROUP BY的子句，允许开发人员在GROUP BY语句后面指定多个统计选项，可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来。 select device_id ,os_id ,app_id , count ( user_id ) from test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id),(os_id),(device_id,os_id),()) -- 等价于 SELECT device_id, null , null , count ( user_id ) FROM test_xinyan_reg group by device_id UNION ALL SELECT null ,os_id, null , count ( user_id ) FROM test_xinyan_reg group by os_id UNION ALL SELECT device_id,os_id, null , count ( user_id ) FROM test_xinyan_reg group by device_id,os_id UNION ALL SELECT null , null , null , count

Cannot create Hive external table using jdbcStorageHandler

阅读更多关于 Cannot create Hive external table using jdbcStorageHandler

问题 I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code: CREATE EXTERNAL TABLE hive_table ( col1 int, col2 string, col3 date ) STORED BY 'org.apache.hive.storage.jdbc

Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

阅读更多关于 Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

问题 I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention: file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a40-4333-8405-8451faa44319- c000.snappy.parquet 回答1: You should use "Talk is cheap, show me the code." methodology. Everything is not documented and one way to go is just the code. Consider part-1-2_3-4.parquet : Split/Partition number. Random UUID to prevent collision between different (appending)

Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

阅读更多关于 Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

Hive/Impala performance with string partition key vs Integer partition key

阅读更多关于 Hive/Impala performance with string partition key vs Integer partition key

问题 Are numeric columns recommended for partition keys? Will there be any performance difference when we do a select query on numeric column partitions vs string column partitions? 回答1: No, there is no such recommendation. Consider this: The thing is that partition representation in Hive is a folder with a name like 'key=value' or it can be just 'value' but anyway it is string folder name. So it is being stored as string and is being cast during read/write. Partition key value is not packed