问题
I'm looking to set a variable like below, called today_date
, and then be able to reuse it as a variable throughout the query. The below throws an error.
set today_date = date_format(date_sub(current_date, 1), 'YYYYMMdd')
select account
from table
where data_date = today_date
回答1:
First command should end with semicolon:
set today_date=date_format(date_sub(current_date, 1), 'YYYYMMdd');
And variable should be used like this:
select account
from table
where data_date=${hivevar:today_date};
set command will not calculate expression and it will be substituted as is. The resulted query will be
select account
from table
where data_date = date_format(date_sub(current_date, 1), 'YYYYMMdd');
If you want variable to be already calculated, then calculate it in a shell and pass to your Hive script like in this answer: https://stackoverflow.com/a/37821218/2700344
回答2:
You still need to put a semicolon at the end of the set
line, surround your variable with ${}
and use the proper namespace.
Note that this will not execute the date_format()
function when the variable is defined. When you use the variable the SQL code will just be copied as-is. Think of it as more as a macro than as a variable.
Furthermore, Hive has multiple variable namespaces. The 2 easiest options are either to be less verbose when you define your variable but to be more verbose when you use it (hiveconf namespace):
set today_date = date_format(date_sub(current_date, 1), 'YYYYMMdd');
select account from table where data_date = ${hiveconf:today_date};
or the other way round (hivevar namespace)
set hivevar:today_date = date_format(date_sub(current_date, 1), 'YYYYMMdd');
select account from table where data_date = ${today_date};
来源:https://stackoverflow.com/questions/51486896/looking-to-set-a-reusable-variable-in-hive