问题
Based on
Amazon Redshift announces support for lateral column alias reference:
The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement:
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Which is basically the same as:
select 1 AS col
,col + 1 AS col2;
db<>fiddle demo
Most SQL RDBMSes will return an error: Unknown column 'col' in 'field list'
It looks like as interesting language extension but there is a caveat. What if I have an undeterministic function:
select RAND() AS col
,col + 1 AS col2
-- if RAND() returns 0.5 then I would expect
-- 0.5 and 1.5
-- I get: 0.3 and 1.7
-- it means that the query was evaluated as:
select RAND() AS col,
RAND() + 1 AS col2
Comparing with LATERAL JOIN
from PostgreSQL(yes, I am aware this is different feature, I would expect "lateral coulmn alias" to behave the same way):
SELECT s.col, s.col+1 AS col2
FROM t ,LATERAL (SELECT RANDOM()) AS s(col)
-- 0.19089933477628307 1.190899334776283
db<>fiddle demo
But it is not a case. I am getting two independent runs which seems to be valid if it is simple "inlining":
SELECT List
The alias is recognized right after it is defined in the target list. You can use an alias in other expressions defined after it in the same target list. The following example illustrates this.
The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the FROM clause as the previously aliased expression, the column in the FROM clause takes priority.
Is my understanding correct and this functionality is not "safe" when we are using undeterministic or time-sensitive function/references/subqueries?
回答1:
This syntax is not safe. In fact, merely inlining the code means that it does not even provide a performance advantage. It is only syntactic sugar.
Given that there are easy alternatives -- CTEs and subqueries -- I would just avoid this new "feature".
If there were a setting to turn this off, I would recommend using it.
Incidentally, many newcomers to SQL find this quite disconcerting. This purpose is to avoid ambiguity. What should the following query return?
select (a + 1) as b, b
from (select 1 as a, 0 as b) x;
The designers of SQL probably felt that the rules around resolving such situations are more complex than merely rewriting a subquery.
The one "database" that I know of that resolves this well is actually SAS proc SQL
. It introduced the calculated
keyword, so you can write:
select (a + 1) as b, calculated b, b
from (select 1 as a, 0 as b) x;
And this would return 2, 2, 0
.
In other words, I don't think Amazon put much thought into the implementation of this "feature".
回答2:
Why not make use of With
Clause to better understand reusablity as this is the way you should perform your reusability of alias at runtime.
With Data
As (Select count( *) from table)
Select Data+1, Data from table
Regarding usage of above lateral reference if u mean as safe. Theres nothing safe as you would still define some expression in that same select to be able to use it in other column. Its just that its an alternative too. No such declaration is safe as it is explicit and vulnerable at runtime. For safety purposes you should store the data expression in sort of some function or a procedure and have access rights on it like we have implicit functions/procedures like Max() etc. But, for other purposes i guess with would be less time consuming than above amazon redshift functionality as it runs inline
so it will run each time row by row
来源:https://stackoverflow.com/questions/59334543/amazon-redshift-lateral-column-alias-reference