hiveql

get latest data from hive table with multiple partition columns

风流意气都作罢 提交于 2021-02-19 05:36:06
问题 I have a hive table with below structure ID string, Value string, year int, month int, day int, hour int, minute int This table is refreshed every 15 mins and it is partitioned with year/month/day/hour/minute columns. Please find below samples on partitions. year=2019/month=12/day=29/hour=19/minute=15 year=2019/month=12/day=30/hour=00/minute=45 year=2019/month=12/day=30/hour=08/minute=45 year=2019/month=12/day=30/hour=09/minute=30 year=2019/month=12/day=30/hour=09/minute=45 I want to select

Optimizing Hive GROUP BY when rows are sorted

北城余情 提交于 2021-02-19 05:31:39
问题 I have the following (very simple) Hive query: select user_id, event_id, min(time) as start, max(time) as end, count(*) as total, count(interaction == 1) as clicks from events_all group by user_id, event_id; The table has the following structure: user_id event_id time interaction Ex833Lli36nxTvGTA1Dv juCUv6EnkVundBHSBzQevw 1430481530295 0 Ex833Lli36nxTvGTA1Dv juCUv6EnkVundBHSBzQevw 1430481530295 1 n0w4uQhOuXymj5jLaCMQ G+Oj6J9Q1nI1tuosq2ZM/g 1430512179696 0 n0w4uQhOuXymj5jLaCMQ G

Does Hive preserve file order when selecting data

不打扰是莪最后的温柔 提交于 2021-02-19 04:05:44
问题 If I do select * from table1; in which order data will retrieve File order Or random order 回答1: Without ORDER BY the order is not guaranteed. Data is being read in parallel by many processes (mappers), after splits were calculated, each process starts reading some piece of file or few files, depending on splits calculated. All parallel processes can process different volume of data and running on different nodes, the load is not the same each time, so they start returning rows and finishing

Reg : Efficiency among query optimizers in hive

做~自己de王妃 提交于 2021-02-18 18:13:30
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Reg : Efficiency among query optimizers in hive

早过忘川 提交于 2021-02-18 18:13:25
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Reg : Efficiency among query optimizers in hive

♀尐吖头ヾ 提交于 2021-02-18 18:12:30
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Reg : Efficiency among query optimizers in hive

大憨熊 提交于 2021-02-18 18:11:08
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Check if a hive table is partitioned on a given column

元气小坏坏 提交于 2021-02-17 04:45:04
问题 I have a list of hive tables , of which some are partitioned. Given a column I need to check if a particular table is partitioned on that column or not. I have searched and found that desc formatted tablename would result in all the details of the table. Since I have to iterate over all the tables and get the list , desc formatted would not help. Is there any other way this can be done. 回答1: You can connect directly to metastore and query it: metastore=# select d."NAME" as DATABASE, t."TBL

Check if a hive table is partitioned on a given column

◇◆丶佛笑我妖孽 提交于 2021-02-17 04:43:40
问题 I have a list of hive tables , of which some are partitioned. Given a column I need to check if a particular table is partitioned on that column or not. I have searched and found that desc formatted tablename would result in all the details of the table. Since I have to iterate over all the tables and get the list , desc formatted would not help. Is there any other way this can be done. 回答1: You can connect directly to metastore and query it: metastore=# select d."NAME" as DATABASE, t."TBL

Check if a hive table is partitioned on a given column

大城市里の小女人 提交于 2021-02-17 04:42:12
问题 I have a list of hive tables , of which some are partitioned. Given a column I need to check if a particular table is partitioned on that column or not. I have searched and found that desc formatted tablename would result in all the details of the table. Since I have to iterate over all the tables and get the list , desc formatted would not help. Is there any other way this can be done. 回答1: You can connect directly to metastore and query it: metastore=# select d."NAME" as DATABASE, t."TBL