hiveql | 易学教程

Adding/Defining Jars in Hive permanently

阅读更多关于 Adding/Defining Jars in Hive permanently

问题 I was trying to add a jar in Hive classpath using below add command. Command: hive> add myjar.jar but whenever i login to hive, i need to add myjar.jar using add cmd. Is there any way I can add it permanently in Hive Classpath. Regards, Mohammed Niaz 回答1: add this to your .hiverc file add jar myjar.jar have a look at this if you require further info http://hadooped.blogspot.in/2013/08/hive-hiverc-file.html 回答2: In order to add them permanently recommended ways are as follows. add in hive-site

updating Hive external table with HDFS changes

阅读更多关于 updating Hive external table with HDFS changes

问题 lets say, I created Hive external table "myTable" from file myFile.csv ( located in HDFS ). myFile.csv is changed every day, then I'm interested to update "myTable" once a day too. Is there any HiveQL query that tells to update the table every day? Thank you. P.S. I would like to know if it works the same way with directories: lets say, I create Hive partition from HDFS directory "myDir", when "myDir" contains 10 files. next day "myDIr" contains 20 files (10 files were added). Should I update

How to convert timestamp (with dot between second and millisecond) to date(yyyyMMdd) in Hive?

阅读更多关于 How to convert timestamp (with dot between second and millisecond) to date(yyyyMMdd) in Hive?

问题 I want to convert timestamp, 1490198341.705 for example, to date 20170323 and to hour 11 (GMT+8:00). Are there any functions to solve this? 回答1: Try this: select date_format(from_utc_timestamp(1490198341.705,'GMT+8:00'),'yyyyMMdd HH:mm:ss'); 来源： https://stackoverflow.com/questions/42975333/how-to-convert-timestamp-with-dot-between-second-and-millisecond-to-dateyyyym

Hadoop-Hive | Convert single row columns into multiple rows in Hive

阅读更多关于 Hadoop-Hive | Convert single row columns into multiple rows in Hive

问题 I have a Hive table like this Created_date ID1 Name1 Age1 Gender1 Name2 ID2 Age2 Gender2 ID3 Name3 Age3 Gender3.... 2014-02-01 1 ABC 21 M MNP 2 22 F 3 XYZ 25 M 2015-06-06 11 LMP 31 F PLL 12 42 M 13 UIP 37 F This table may have any no. of repeated set of the 4 columns pair. The sequence of these 4 columns is also not fix and there may be 1 or 2 more columns which are not repeated like created_date I need to convert the above table into a new Hive table having only 4 columns ID, Name, Age and

Random sample table with Hive, but including matching rows

阅读更多关于 Random sample table with Hive, but including matching rows

问题 I have a large table containing a userID column and other user variable columns, and I would like to use Hive to extract a random sample of users based on their userID . Furthermore, sometimes these users will be on multiple rows and if a randomly selected userID is contained in other parts of the table I would like to extract those rows too. I had a look at the Hive sampling documentation and I see that something like this can be done to extract a 1% sample: SELECT * FROM source TABLESAMPLE

Hive: How to explode table with map column

阅读更多关于 Hive: How to explode table with map column

问题 I have a table like this +-----+------------------------------+ | id | mapCol | +-----+------------------------------+ | id1 | {key1:val1, key2:val2} | | id2 | {key1:val3, key2:val4} | +-----+------------------------------+ so i can easily perform a query like select explode(mapCol) as (key, val) from myTab where id='id1' and i get +--------+-----+ | key | val | +--------+-----+ | key1 | val1| | key2 | val2| +--------+-----+ I want to generate a table like this +-----+------+-----+ |id | key

How to insert/copy one partition's data to multiple partitions in hive?

阅读更多关于 How to insert/copy one partition's data to multiple partitions in hive?

问题 I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02' , '2019-01-03' ... '2019-01-31' ) I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'. INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03') SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01'; 回答1: Cross join all your data with calendar dates for required date range. Use dynamic partitioning:

what's SparkSQL SQL query to write into JDBC table?

阅读更多关于 what's SparkSQL SQL query to write into JDBC table?

问题 For SQL query in Spark. For read, we can read jdbc by CREATE TEMPORARY TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS dbtable ...; For write, what is the query to write the data to the remote JDBC table using SQL? NOTE: I want it to be SQL query. plz provide the pure "SQL query" that can write to jdbc when using HiveContext.sql(...) of SparkSQL. 回答1: You can write the dataframe with jdbc similar to follows. df.write.jdbc(url, "TEST.BASICCREATETEST", new Properties) 回答2: An INSERT

How to alter Hive partition column name

阅读更多关于 How to alter Hive partition column name

问题 I have to change the partition column name (not partition spec), I looked for the commands in hive wiki and some google pages. I can find the options for altering the partition spec, i.e. For example In /table/country='US' I can change US to USA, but I want to change country to continent . I feel like the only option available for changing partition column name is dropping and re-creating the table. Is there is any other option available please help me. Thanks in advance. 回答1: You can change

Both left and right aliases encountered in Hive JOIN; without any inequality clause

阅读更多关于 Both left and right aliases encountered in Hive JOIN; without any inequality clause

问题 I am using following query: Select S.MDSE_ITEM_I, S.CO_LOC_I, MAX(S.SLS_D) as MAX_SLS_D, MIN(S.SLS_D) as MIN_SLS_D, sum(S.SLS_UNIT_Q) as SLS_UNIT_Q, MIN(PRSMN_VAL_STRT_D) as PRSMN_VAL_STRT_D, MIN(PRSMN_VAL_END_D) as PRSMN_VAL_END_D, MIN(RC.FRST_RCPT_D) as FRST_RCPT_D, MIN(RC.CURR_ACTV_FRST_OH_D) as CURR_ACTV_FRST_OH_D, MIN(H.GREG_D) as OH_GREG_D from eefe_lstr4.SLS_TBL as S left outer join eefe_lstr4.PRS_TBL P on S.MDSE_ITEM_I = P.MDSE_ITEM_I and S.CO_LOC_I = P.CO_LOC_I and S.SLS_D between