问题
I have come across this piece of 'voodoo' SQL which is being used to perform custom grouping of data from a table. I would like to understand how it does it magic, but I am unable to grok it. Can a SQL expert out there explain in simple English to someone who is not very SQL literate, the various parts of this snippet, which allows it to do its magic?
select ceil(rnk/10.0) as grp,
col1, col2, col3, col4, col5, col6, col7
from (select e.col1, e.col2, e.col3, e.col4, e.col5, e.col6, e.col7,
(select count(*)
from mytable d
where e.col1 > d.col1)+1 as rnk
from mytable e) x
order by grp;
The part of the SQL above that I can't seem to get my head around is the inner SQL that returns the column 'x':
(select count(*) from mytable d
where e.col1 > d.col1)+1 as rnk
from mytable e
) x
I would expect to be able to run that query by itself:
select count(*) from mytable d
where e.col1 > d.col1)+1 as rnk
from mytable e
However, when I do that, I get the error:
ERROR: syntax error at or near "+" LINE 2: where e.col1 > d.col1)+1 as rnk
So, what's going on there?!
Also, the current SQL is hard coded with the number 10. I would like to wrap a function around it, in order to be able to call the function with numbers other than 10.
The backend database is PostgreSQL, so the function will be in PL/pgSQL. Here is my first attempt at writing such a function - however, this is not quite correct, as I want to return multiple rows of the specified columns - so the function below needs to be modified somewhat, not entirely sure how:
CREATE OR REPLACE FUNCTION my_custom_grouping(in integer,
out grp integer,
out col1 double,
out col2 double,
out col3 double,
out col4 double,
out col5 double,
out col6 double,
out col7 double
)
AS $$ SELECT
ceil(rnk/$1) as grp,
col1, col2, col3, col4, col5, col6, col7
from (
select e.col1, e.col2, e.col3, e.col4, e.col5, e.col6, e.col7,
(select count(*) from mytable d
where e.col1 > d.col1)+1 as rnk
from mytable e
) x
order by grp;
$$
LANGUAGE SQL;
Apart from the function not returning more than one row, I'm not sure if this is the best way to parametrize the query - am I on the correct path? - if yes, how would I modify the function above to return multiple rows instead of the current single "row" (i.e. "multiple column" output)?
Is that the correct way to run aggregate functions (grouped by 'grp') on the data returned from the function?
回答1:
Your (simplified!) function could look like this:
CREATE OR REPLACE FUNCTION my_custom_grouping(integer)
RETURNS TABLE (
grp integer,
col1 double precision,
col2 double precision,
col3 double precision,
col4 double precision,
col5 double precision,
col6 double precision,
col7 double precision) AS
$BODY$
SELECT ceil(rank() OVER (ORDER BY col1) / $1)::int as grp
,col1, col2, col3, col4, col5, col6, col7
FROM mytable
ORDER BY 1;
$BODY$ LANGUAGE SQL;
Major points:
Note that this is language SQL, so not a PL/pgSQL function. You could use
language plpgsql, too, but that's not necessary here.I replaced the core of your voodoo with the window function rank(), which should do the same exactly, just simpler.
I also removed the subquery altogether. It is not necessary.
The type
doubleis called double precision in PostgreSQL.To return multiple rows, define a function as
RETURNS SETOF recordor RETURNS TABLE as I did.ORDER BYcan use positional parameters, so you do not have to spell out the calculation of the first column again:ORDER BY 1.
However, multiple rows in the samegrp. Add more columns or expressions to theORDER BYclause to arrive at a stable sort order.
来源:https://stackoverflow.com/questions/8725900/can-someone-explain-this-sql-and-how-may-i-parametrize-it-and-invoke-as-a-fu