hiveql

hive: cast array<struct<key:string,value:array<string>>> into map<string,array<string>>

醉酒当歌 提交于 2019-12-08 07:52:24
问题 I have a hive table like name string address string timezone string one_key_value array<struct<key:string,value:array<string>> two_key_value array<struct<key:string,value:array<string>> and want to convert it to name string address string timezone string one_key_value map<string,array<string>> two_key_value map<string,array<string>> There is explode(array) but doesn't really return the entire table in the format I want. 回答1: Use lateral view with inline and map the resulting keys and values.

Retrieve 3rd MAX salary in Hive

可紊 提交于 2019-12-08 07:06:33
问题 I'm a novice. I have the following Employee table. ID Name Country Salary ManagerID I retrieved the 3rd max salary using the following. select name , salary From ( select name, salary from employee sort by salary desc limit 3) result sort by salary limit 1; How to do the same to display 3rd max salary for each country? can we use OVER ( PARTITION BY country )? I tried looking in the languageManual Windowing and Analytics but I'm finding it difficult to understand. Please help! 回答1: You're

Hive Tables are created from spark but are not visible in hive

落爺英雄遲暮 提交于 2019-12-08 07:04:56
问题 From spark using: DataFrame.write().mode(SaveMode.Ignore).format("orc").saveAsTable("myTableName") Table is getting saved I can see using below command's hadoop fs -ls /apps/hive/warehouse\test.db' where test is my database name drwxr-xr-x - psudhir hdfs 0 2016-01-04 05:02 /apps/hive/warehouse/test.db/myTableName but when I trying to check tables in Hive I cannot view them either with command SHOW TABLES from hiveContext. 回答1: sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/ This

Write a nested select statement with a where clause in Hive

耗尽温柔 提交于 2019-12-08 06:33:28
问题 I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows; select * from TableA where TA_timestamp > (select timestmp from TableB where id="hourDim") Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?! To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this

Semantic exception error in HIVE while using last_value window function

末鹿安然 提交于 2019-12-08 06:17:10
问题 I have a table with the following data: dt device id count 2018-10-05 computer 7541185957382 6 2018-10-20 computer 7541185957382 3 2018-10-14 computer 7553187775734 6 2018-10-17 computer 7553187775734 10 2018-10-21 computer 7553187775734 2 2018-10-22 computer 7549187067178 5 2018-10-20 computer 7553187757256 3 2018-10-11 computer 7549187067178 10 I want to get the last and first dt for each id . Hence, I used the window functions first_value and last_value as follows: select id,last_value(dt)

How to Find the average of hh:mm:ss in hive

不打扰是莪最后的温柔 提交于 2019-12-08 05:03:02
问题 Consider i have hive table with columns script_name, start_time, end_time, duration. Start time, end time and duration are in the format of hh:mm:ss. My requirement is to find the average time of these columns for last 7 days and put into a file. 回答1: Convert to unix_timestamp, sum, divide by 3, convert to bigint and convert back to HH:mm:ss: with data as --Data example. Use your table instead (select '12:10:30' start_time,'01:10:00' end_time, '02:10:00' duration) select from_unixtime(cast(

HIVE how to update the existing data if it exists based on some condition and insert new data if not exists

橙三吉。 提交于 2019-12-08 04:21:43
问题 I want to update the existing data if it exists based on some condition(data with higher priority should be updated) and insert new data if not exists. I have already written a query for this but somehow it is duplicating the number of rows. Here is the full explanation of what I have and what I want to achieve: What I have: Table 1 - columns - id,info,priority hive> select * from sample1; OK 1 123 1.01 2 234 1.02 3 213 1.03 5 213423 1.32 Time taken: 1.217 seconds, Fetched: 4 row(s) Table 2:

Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using “ADD JAR”

你说的曾经没有我的故事 提交于 2019-12-08 02:56:13
问题 1) I have added serde jar file using "ADD JAR /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar;" 2) Create table 3) The table is creates successfully 4) But when I execute any select query it throws file not found exception hive> select count(*) from tab_tweets; Query ID = hduser_20150604145353_51b4def4-11fb-4638-acac-77301c1c1806 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes):

How to efficiently query a hive table in spark using hive context?

送分小仙女□ 提交于 2019-12-07 23:50:55
问题 I have a 1.6T Hive table with time series data. I am using Hive 1.2.1 and Spark 1.6.1 in scala . Following is the query which I have in my code. But I always get Java out of memory error . val sid_data_df = hiveContext.sql(s"SELECT time, total_field, sid, year, date FROM tablename WHERE sid = '$stationId' ORDER BY time LIMIT 4320000 ") By iteratively selecting few records at a time from hive table, I am trying to do a sliding window on the resultant dataframe I have a cluster of 4 nodes with

How to use Hive Query results(multiple) in a variable for other query

无人久伴 提交于 2019-12-07 23:15:49
问题 I have two tables one is schools and one is students.I want to find all the students of a particular school. The schema of schools is: id, name, location and of students is :id, name, schoolId. I wrote the following script: schoolId=$(hive -e "set hive.cli.print.header=false;select id from school;") hive -hiveconf "schoolId"="$schoolId" hive>select id,name from student where schoolId like '${hiveconf:schoolId}%' I dont get any result as schoolId stores all the id together.For example there