hiveql | 易学教程

Difference in statistics from Google Analytics Report and BigQuery Data in Hive table

阅读更多关于 Difference in statistics from Google Analytics Report and BigQuery Data in Hive table

问题 I have a Google Analytics premium account set up to monitor the user activity of a website and mobile application. Raw data from GA is being stored in BigQuery tables. However, I noticed that the statistics that I see in a GA report are quite different the statistics that I see when querying the BigQuery tables. I understand that GA reports show aggregated data and possibly, sampled data. And that the raw data in Bigquery tables is session/hit-level data. But I am still not sure if I

passing multiple dates as a paramters to Hive query

阅读更多关于 passing multiple dates as a paramters to Hive query

问题 I am trying to pass a list of dates as parameter to my hive query. #!/bin/bash echo "Executing the hive query - Get distinct dates" var=`hive -S -e "select distinct substr(Transaction_date,0,10) from test_dev_db.TransactionUpdateTable;"` echo $var echo "Executing the hive query - Get the parition data" hive -hiveconf paritionvalue=$var -e 'SELECT Product FROM test_dev_db.TransactionMainHistoryTable where tran_date in("${hiveconf:paritionvalue}");' echo "Hive query - ends" Output as: Executing

How to use Hive Query results(multiple) in a variable for other query

阅读更多关于 How to use Hive Query results(multiple) in a variable for other query

I have two tables one is schools and one is students.I want to find all the students of a particular school. The schema of schools is: id, name, location and of students is :id, name, schoolId. I wrote the following script: schoolId=$(hive -e "set hive.cli.print.header=false;select id from school;") hive -hiveconf "schoolId"="$schoolId" hive>select id,name from student where schoolId like '${hiveconf:schoolId}%' I dont get any result as schoolId stores all the id together.For example there are 3 schools with id: 123, 256,346 schoolId variable stores as 123 256 346 and the result is null. Use

get the current date and set it to variable in order to use it as table name in HIVE

阅读更多关于 get the current date and set it to variable in order to use it as table name in HIVE

I want to get the current date as YYMMDD and then set it to variable in order to use it as table name. Here is my code: set dates= date +%Y-%m-%d; CREATE EXTERNAL TABLE IF NOT EXISTS dates( id STRING, region STRING, city STRING) But this method doesn't work, because it seems the assignments are wrong. Any idea? Hive does not calculate variables, it substitutes them as is, in your case it will be exactly this string ' date +%Y-%m-%d '. Also it is not possible to use UDF like current_date() in place of table name in DDL. The solution is to calculate variable in the shell and pass it to Hive: In

Map-Reduce Logs on Hive-Tez

阅读更多关于 Map-Reduce Logs on Hive-Tez

问题 I want to get the interpretation of Map-Reduce logs after running a query on Hive-Tez ? What the lines after INFO: conveys ? Here I have attached a sample INFO : Session is already open INFO : Dag name: SELECT a.Model...) INFO : Tez session was closed. Reopening... INFO : Session re-established. INFO : INFO : Status: Running (Executing on YARN cluster with App id application_14708112341234_1234) INFO : Map 1: -/- Map 3: -/- Map 4: -/- Map 7: -/- Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13

How to insert/copy one partition's data to multiple partitions in hive?

阅读更多关于 How to insert/copy one partition's data to multiple partitions in hive?

I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02' , '2019-01-03' ... '2019-01-31' ) I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'. INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03') SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01'; Cross join all your data with calendar dates for required date range. Use dynamic partitioning: set hivevar:start_date=2019-01-02; set hivevar:end_date=2019-01-31; set hive.exec.dynamic.partition=true; set

Looping using Hiveql

阅读更多关于 Looping using Hiveql

问题 I'm trying to merge 2 datasets, say A and B. The dataset A has a variable "Flag" which takes 2 values. Rather than jut merging both data together I was trying to merge 2 datasets based on "flag" variable. The merging code is the following: create table new_data as select a.*,b.y from A as a left join B as b on a.x=b.x Since I'm running Hive code through CLI, I'm calling this through the following command hive -f new_data.hql The looping part of the code I'm calling to merge data based on

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------— (on Linux)

阅读更多关于 The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------— (on Linux)

问题 The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx-------- Hi, The following Spark code i was executing in Eclipse of CDH 5.8 & getting above RuntimeExeption public static void main(String[] args) { final SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("HiveConnector"); final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); SQLContext sqlContext = new HiveContext(sparkContext); DataFrame df = sqlContext.sql("SELECT *

Error in Hive : Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected

阅读更多关于 Error in Hive : Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected

I am trying to translate some PL/SQL script in hive, and i faced an error with one HiveQL script. The error is this one : FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected. I think that the error is coming from this part of script : SELECT mag.co_magasin, dem.id_produit as id_produit_orig, pnvente.dt_debut_commercial as dt_debut_commercial, COALESCE(pnvente.id_produit

Hive function to replace comma in column value

阅读更多关于 Hive function to replace comma in column value

I have a hive table which has String column having value as 12,345. Is there any way hive function which can remove comma during insertion in this hive table ? You can use regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) which is a function in Hive. So if you are moving the data from a table that contains the comma to a new table you will use : insert into table NEW select regexp_replace(commaColumn,',','') from OLD; Harikrishnan Ck Hive does have split function. which can be used here. split and concat to achieve the desired result You may refer this question. Does