hiveql

Adding/Defining Jars in Hive permanently

断了今生、忘了曾经 提交于 2019-12-10 14:57:14
问题 I was trying to add a jar in Hive classpath using below add command. Command: hive> add myjar.jar but whenever i login to hive, i need to add myjar.jar using add cmd. Is there any way I can add it permanently in Hive Classpath. Regards, Mohammed Niaz 回答1: add this to your .hiverc file add jar myjar.jar have a look at this if you require further info http://hadooped.blogspot.in/2013/08/hive-hiverc-file.html 回答2: In order to add them permanently recommended ways are as follows. add in hive-site

updating Hive external table with HDFS changes

大城市里の小女人 提交于 2019-12-10 14:40:00
问题 lets say, I created Hive external table "myTable" from file myFile.csv ( located in HDFS ). myFile.csv is changed every day, then I'm interested to update "myTable" once a day too. Is there any HiveQL query that tells to update the table every day? Thank you. P.S. I would like to know if it works the same way with directories: lets say, I create Hive partition from HDFS directory "myDir", when "myDir" contains 10 files. next day "myDIr" contains 20 files (10 files were added). Should I update

How to convert timestamp (with dot between second and millisecond) to date(yyyyMMdd) in Hive?

∥☆過路亽.° 提交于 2019-12-10 13:38:46
问题 I want to convert timestamp, 1490198341.705 for example, to date 20170323 and to hour 11 (GMT+8:00). Are there any functions to solve this? 回答1: Try this: select date_format(from_utc_timestamp(1490198341.705,'GMT+8:00'),'yyyyMMdd HH:mm:ss'); 来源: https://stackoverflow.com/questions/42975333/how-to-convert-timestamp-with-dot-between-second-and-millisecond-to-dateyyyym

Hadoop-Hive | Convert single row columns into multiple rows in Hive

微笑、不失礼 提交于 2019-12-10 12:25:14
问题 I have a Hive table like this Created_date ID1 Name1 Age1 Gender1 Name2 ID2 Age2 Gender2 ID3 Name3 Age3 Gender3.... 2014-02-01 1 ABC 21 M MNP 2 22 F 3 XYZ 25 M 2015-06-06 11 LMP 31 F PLL 12 42 M 13 UIP 37 F This table may have any no. of repeated set of the 4 columns pair. The sequence of these 4 columns is also not fix and there may be 1 or 2 more columns which are not repeated like created_date I need to convert the above table into a new Hive table having only 4 columns ID, Name, Age and

Random sample table with Hive, but including matching rows

蓝咒 提交于 2019-12-10 11:55:42
问题 I have a large table containing a userID column and other user variable columns, and I would like to use Hive to extract a random sample of users based on their userID . Furthermore, sometimes these users will be on multiple rows and if a randomly selected userID is contained in other parts of the table I would like to extract those rows too. I had a look at the Hive sampling documentation and I see that something like this can be done to extract a 1% sample: SELECT * FROM source TABLESAMPLE

Hive: How to explode table with map column

痴心易碎 提交于 2019-12-10 10:54:28
问题 I have a table like this +-----+------------------------------+ | id | mapCol | +-----+------------------------------+ | id1 | {key1:val1, key2:val2} | | id2 | {key1:val3, key2:val4} | +-----+------------------------------+ so i can easily perform a query like select explode(mapCol) as (key, val) from myTab where id='id1' and i get +--------+-----+ | key | val | +--------+-----+ | key1 | val1| | key2 | val2| +--------+-----+ I want to generate a table like this +-----+------+-----+ |id | key

How to insert/copy one partition's data to multiple partitions in hive?

核能气质少年 提交于 2019-12-10 10:19:52
问题 I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02' , '2019-01-03' ... '2019-01-31' ) I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'. INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03') SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01'; 回答1: Cross join all your data with calendar dates for required date range. Use dynamic partitioning:

what's SparkSQL SQL query to write into JDBC table?

房东的猫 提交于 2019-12-10 06:25:03
问题 For SQL query in Spark. For read, we can read jdbc by CREATE TEMPORARY TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS dbtable ...; For write, what is the query to write the data to the remote JDBC table using SQL? NOTE: I want it to be SQL query. plz provide the pure "SQL query" that can write to jdbc when using HiveContext.sql(...) of SparkSQL. 回答1: You can write the dataframe with jdbc similar to follows. df.write.jdbc(url, "TEST.BASICCREATETEST", new Properties) 回答2: An INSERT

How to alter Hive partition column name

六月ゝ 毕业季﹏ 提交于 2019-12-10 01:58:56
问题 I have to change the partition column name (not partition spec), I looked for the commands in hive wiki and some google pages. I can find the options for altering the partition spec, i.e. For example In /table/country='US' I can change US to USA, but I want to change country to continent . I feel like the only option available for changing partition column name is dropping and re-creating the table. Is there is any other option available please help me. Thanks in advance. 回答1: You can change

Both left and right aliases encountered in Hive JOIN; without any inequality clause

别等时光非礼了梦想. 提交于 2019-12-09 14:56:55
问题 I am using following query: Select S.MDSE_ITEM_I, S.CO_LOC_I, MAX(S.SLS_D) as MAX_SLS_D, MIN(S.SLS_D) as MIN_SLS_D, sum(S.SLS_UNIT_Q) as SLS_UNIT_Q, MIN(PRSMN_VAL_STRT_D) as PRSMN_VAL_STRT_D, MIN(PRSMN_VAL_END_D) as PRSMN_VAL_END_D, MIN(RC.FRST_RCPT_D) as FRST_RCPT_D, MIN(RC.CURR_ACTV_FRST_OH_D) as CURR_ACTV_FRST_OH_D, MIN(H.GREG_D) as OH_GREG_D from eefe_lstr4.SLS_TBL as S left outer join eefe_lstr4.PRS_TBL P on S.MDSE_ITEM_I = P.MDSE_ITEM_I and S.CO_LOC_I = P.CO_LOC_I and S.SLS_D between