hiveql | 易学教程

hive: cast array<struct<key:string,value:array<string>>> into map<string,array<string>>

阅读更多关于 hive: cast array into map

问题 I have a hive table like name string address string timezone string one_key_value array<struct<key:string,value:array<string>> two_key_value array<struct<key:string,value:array<string>> and want to convert it to name string address string timezone string one_key_value map<string,array<string>> two_key_value map<string,array<string>> There is explode(array) but doesn't really return the entire table in the format I want. 回答1: Use lateral view with inline and map the resulting keys and values.

Retrieve 3rd MAX salary in Hive

阅读更多关于 Retrieve 3rd MAX salary in Hive

问题 I'm a novice. I have the following Employee table. ID Name Country Salary ManagerID I retrieved the 3rd max salary using the following. select name , salary From ( select name, salary from employee sort by salary desc limit 3) result sort by salary limit 1; How to do the same to display 3rd max salary for each country? can we use OVER ( PARTITION BY country )? I tried looking in the languageManual Windowing and Analytics but I'm finding it difficult to understand. Please help! 回答1: You're

Hive Tables are created from spark but are not visible in hive

阅读更多关于 Hive Tables are created from spark but are not visible in hive

问题 From spark using: DataFrame.write().mode(SaveMode.Ignore).format("orc").saveAsTable("myTableName") Table is getting saved I can see using below command's hadoop fs -ls /apps/hive/warehouse\test.db' where test is my database name drwxr-xr-x - psudhir hdfs 0 2016-01-04 05:02 /apps/hive/warehouse/test.db/myTableName but when I trying to check tables in Hive I cannot view them either with command SHOW TABLES from hiveContext. 回答1: sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/ This

Write a nested select statement with a where clause in Hive

阅读更多关于 Write a nested select statement with a where clause in Hive

问题 I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows; select * from TableA where TA_timestamp > (select timestmp from TableB where id="hourDim") Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?! To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this

Semantic exception error in HIVE while using last_value window function

阅读更多关于 Semantic exception error in HIVE while using last_value window function

问题 I have a table with the following data: dt device id count 2018-10-05 computer 7541185957382 6 2018-10-20 computer 7541185957382 3 2018-10-14 computer 7553187775734 6 2018-10-17 computer 7553187775734 10 2018-10-21 computer 7553187775734 2 2018-10-22 computer 7549187067178 5 2018-10-20 computer 7553187757256 3 2018-10-11 computer 7549187067178 10 I want to get the last and first dt for each id . Hence, I used the window functions first_value and last_value as follows: select id,last_value(dt)

How to Find the average of hh:mm:ss in hive

阅读更多关于 How to Find the average of hh:mm:ss in hive

问题 Consider i have hive table with columns script_name, start_time, end_time, duration. Start time, end time and duration are in the format of hh:mm:ss. My requirement is to find the average time of these columns for last 7 days and put into a file. 回答1: Convert to unix_timestamp, sum, divide by 3, convert to bigint and convert back to HH:mm:ss: with data as --Data example. Use your table instead (select '12:10:30' start_time,'01:10:00' end_time, '02:10:00' duration) select from_unixtime(cast(

HIVE how to update the existing data if it exists based on some condition and insert new data if not exists

阅读更多关于 HIVE how to update the existing data if it exists based on some condition and insert new data if not exists

问题 I want to update the existing data if it exists based on some condition(data with higher priority should be updated) and insert new data if not exists. I have already written a query for this but somehow it is duplicating the number of rows. Here is the full explanation of what I have and what I want to achieve: What I have: Table 1 - columns - id,info,priority hive> select * from sample1; OK 1 123 1.01 2 234 1.02 3 213 1.03 5 213423 1.32 Time taken: 1.217 seconds, Fetched: 4 row(s) Table 2:

Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using “ADD JAR”

阅读更多关于 Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using “ADD JAR”

问题 1) I have added serde jar file using "ADD JAR /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar;" 2) Create table 3) The table is creates successfully 4) But when I execute any select query it throws file not found exception hive> select count(*) from tab_tweets; Query ID = hduser_20150604145353_51b4def4-11fb-4638-acac-77301c1c1806 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes):

How to efficiently query a hive table in spark using hive context?

阅读更多关于 How to efficiently query a hive table in spark using hive context?

问题 I have a 1.6T Hive table with time series data. I am using Hive 1.2.1 and Spark 1.6.1 in scala . Following is the query which I have in my code. But I always get Java out of memory error . val sid_data_df = hiveContext.sql(s"SELECT time, total_field, sid, year, date FROM tablename WHERE sid = '$stationId' ORDER BY time LIMIT 4320000 ") By iteratively selecting few records at a time from hive table, I am trying to do a sliding window on the resultant dataframe I have a cluster of 4 nodes with

How to use Hive Query results(multiple) in a variable for other query

阅读更多关于 How to use Hive Query results(multiple) in a variable for other query

问题 I have two tables one is schools and one is students.I want to find all the students of a particular school. The schema of schools is: id, name, location and of students is :id, name, schoolId. I wrote the following script: schoolId=$(hive -e "set hive.cli.print.header=false;select id from school;") hive -hiveconf "schoolId"="$schoolId" hive>select id,name from student where schoolId like '${hiveconf:schoolId}%' I dont get any result as schoolId stores all the id together.For example there