hiveql

How to subtract months from date in HIVE

五迷三道 提交于 2019-12-04 15:36:16
I am looking for a method that helps me subtract months from a date in HIVE I have a date 2015-02-01 . Now i need to subtract 2 months from this date so that result should be 2014-12-01 . Can you guys help me out here? Manoj R select add_months('2015-02-01',-2); if you need to go back to first day of the resulting month: select add_months(trunc('2015-02-01','MM'),-2); Please try add_months date function and pass -2 as months. Internally add_months uses Java Calendar.add method, which supports adding or subtracting (by passing negative integer). https://cwiki.apache.org/confluence/display/Hive

Creating hive partitions for multiple months using one script

不羁的心 提交于 2019-12-04 15:02:53
I have data for 4 years. Like '2011 2012 2013 2014' I have to run queries based on one month's data. So i am creating partitions as below. 'ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='01') LOCATION 'path'; ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='02') LOCATION 'path'; ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='03') LOCATION 'path';' I am creating individual partitions like above for every day of every month. I want to know if we can write a script(any language) an run it one time to create these partitions for

Hive query performance is slow when using Hive date functions instead of hardcoded date strings?

自古美人都是妖i 提交于 2019-12-04 13:33:10
I have a transaction table table_A that gets updated every day. Every day I insert new data into table_A from external table_B using the file_date field to filter the necessary data from external table_B to insert into table_A . However, there's a huge performance difference if I use a hardcoded date vs. using the Hive date functions: -- Fast version (~20 minutes) SET date_ingest = '2016-12-07'; SET hive.exec.dynamic.partition.mode = nonstrict; SET hive.exec.dynamic.partition = TRUE; INSERT INTO TABLE table_A PARTITION (FILE_DATE) SELECT id, eventtime ,CONCAT_WS( '-' ,substr ( eventtime ,0 ,4

Pandas DataFrame to Hive Table

拟墨画扇 提交于 2019-12-04 10:51:45
I'm new to Python and Hive. I was hoping I might get some advice. Does anyone have any tips on how to turn a python pandas dataframe into a hive table? Your script should run inside a machine where hive can load data using the "load local data in path" method. Query pandas data frame to create a list of column name datatype Compose a valid HQL (DDL) create table statement using python string operations (basically concatenations) Issue a create table statement in Hive. Write the pandas dataframe as cvs separated by "\t" turning headers off and index off (check paramerets of to_csv() ) 5.- From

What is the replacement of NULLIF in Hive?

可紊 提交于 2019-12-04 09:51:44
I would like to know what is the replacement of NULLIF in Hive? I am using COALESCE but its not serving my requirement. My query statement is something like : COALESCE(A,B,C) AS D COALESCE will return first NOT NULL value. But my A/B/C contain blank values so COALESCE is not assigning that value to D as it is considering blank as NOT NULL. But I want the correct value to be get assign to D. In SQL I could have use COALESCE(NULLIF(A,'')......) so it will check for blank as well. I tried CASE but it's not working. Just use case : select (case when A is null or A = '' then . . . end) This is

Map-Reduce Logs on Hive-Tez

社会主义新天地 提交于 2019-12-04 09:11:51
I want to get the interpretation of Map-Reduce logs after running a query on Hive-Tez ? What the lines after INFO: conveys ? Here I have attached a sample INFO : Session is already open INFO : Dag name: SELECT a.Model...) INFO : Tez session was closed. Reopening... INFO : Session re-established. INFO : INFO : Status: Running (Executing on YARN cluster with App id application_14708112341234_1234) INFO : Map 1: -/- Map 3: -/- Map 4: -/- Map 7: -/- Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13 INFO : Map 1: -/- Map 3: 0/118 Map 4: 0/118 Map 7: 0/1 Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13

Looping using Hiveql

一笑奈何 提交于 2019-12-04 06:18:50
I'm trying to merge 2 datasets, say A and B. The dataset A has a variable "Flag" which takes 2 values. Rather than jut merging both data together I was trying to merge 2 datasets based on "flag" variable. The merging code is the following: create table new_data as select a.*,b.y from A as a left join B as b on a.x=b.x Since I'm running Hive code through CLI, I'm calling this through the following command hive -f new_data.hql The looping part of the code I'm calling to merge data based on "Flag" variable is the following: for flag in 1 2; do hive -hivevar flag=$flag -f new_data.hql done I put

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------— (on Linux)

纵饮孤独 提交于 2019-12-04 05:58:19
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx-------- Hi, The following Spark code i was executing in Eclipse of CDH 5.8 & getting above RuntimeExeption public static void main(String[] args) { final SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("HiveConnector"); final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); SQLContext sqlContext = new HiveContext(sparkContext); DataFrame df = sqlContext.sql("SELECT * FROM test_hive_table1"); //df.show(); df.count(); } According to Exception /tmp/hive on HDFS should be

How to see the date when the table was created?

怎甘沉沦 提交于 2019-12-04 05:06:08
I have created a table couple months ago. Is there any way in HIVE that I can see when was the table created? show table doesn't give the date creation of the table. Phani Kumar Execute the command desc formatted <database>.<table_name> on the hive cli. It will show detailed table information similar to Detailed Table Information Database: Owner: CreateTime: LastAccessTime: You need to run the following command: describe formatted <your_table_name>; Or if you need this information about a particular partition: describe formatted <your_table_name> partition (<partition_field>=<value>); 来源:

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

我只是一个虾纸丫 提交于 2019-12-04 05:04:53
Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table , you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’ But if you create the table with those clauses, you will then get the casting error when selecting. Error likes: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable To fix this, just