Compare result of two table functions using one column from each

僤鯓⒐⒋嵵緔 提交于 2019-12-10 11:21:31

问题


According the instructions here I have created two functions that use EXECUTE FORMAT and return the same table of (int,smallint).

Sample definitions:

CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass, 
IN field1 integer) 
RETURNS TABLE(id integer, dist smallint)

CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass, 
IN field1 integer) 
RETURNS TABLE(id integer, dist smallint)

Both functions return the exact same number of rows. Sample result (will be always ordered by dist):

(49,0)
(206022,3)
(206041,3)
(92233,4)

Is there a way to compare values of the second field between the two functions for the same rows, to ensure that both results are the same:

For example:

SELECT
function1('tblp1','tblp2',49),function2('tblp1_v2','tblp2_v2',49)

Returns something like:

(49,0)      (49,0)
(206022,3)  (206022,3)
(206041,3)  (206041,3)
(92233,4)   (133,4)

Although I am not expecting identical results (each function is a topK query and I have ties which are broken arbitrarily / with some optimizations in the second function for faster performance) I can ensure that both functions return correct results, if for each row the second numbers in the results are the same. In the example above, I can ensure I get correct results, because:

1st row 0 = 0,
2nd row 3 = 3,
3rd row 3 = 3,
4th row 4 = 4

despite the fact that for the 4th row, 92233!=133

Is there a way to get only the 2nd field of each function result, to batch compare them e.g. with something like:

SELECT COUNT(*)
FROM 
(SELECT
function1('tblp1','tblp2',49).field2,
function2('tblp1_v2','tblp2_v2',49).field2 ) n2
WHERE  function1('tblp1','tblp2',49).field2 != function1('tblp1','tblp2',49).field2;

I am using PostgreSQL 9.3.


回答1:


Is there a way to get only the 2nd field of each function result, to batch compare them?

All of the following answers assume that rows are returned in matching order.

Postgres 9.3

With the quirky feature of exploding rows from SRF functions returning the same number of rows in parallel:

SELECT count(*) AS mismatches
FROM  (
   SELECT function1('tblp1','tblp2',49) AS f1
        , function2('tblp1_v2','tblp2_v2',49) AS f2
   ) sub
WHERE  (f1).dist <> (f2).dist;  -- note the parentheses!

The parentheses around the row type are necessary to disambiguate from a possible table reference. Details in the manual here.

This defaults to Cartesian product of rows if the number of returned rows is not the same (which would break it completely for you).

Postgres 9.4

WITH ORDINALITY to generate row numbers on the fly

You can use WITH ORDINALITY to generate a row number o the fly and don't need to depend on pairing the result of SRF functions in the SELECT list:

SELECT count(*) AS mismatches
FROM      function1('tblp1','tblp2',49)       WITH ORDINALITY AS f1(id,dist,rn)
FULL JOIN function2('tblp1_v2','tblp2_v2',49) WITH ORDINALITY AS f2(id,dist,rn) USING (rn)
WHERE  f1.dist IS DISTINCT FROM f2.dist;

This works for the same number of rows from each function as well as differing numbers (which would be counted as mismatch).

Related:

  • PostgreSQL unnest() with element number

ROWS FROM to join sets row-by-row

SELECT count(*) AS mismatches
FROM   ROWS FROM (function1('tblp1','tblp2',49)
                , function2('tblp1_v2','tblp2_v2',49)) t(id1, dist1, id2, dist2)
WHERE  t.dist1 IS DISTINCT FROM t.dist2;

Related answer:

  • Is it possible to answer queries on a view before fully materializing the view?

Aside:
EXECUTE FORMAT is not a set plpgsql functionality. RETURN QUERY is. format() is just a convenient function for building a query string, can be used anywhere in SQL or plpgsql.




回答2:


The order in which the rows are returned from the functions is not guaranteed. If you can return the row_number() (rn in the below example) from the functions then:

select
    count(f1.dist is null or f2.dist is null or null) as diff_count
from
    function1('tblp1','tblp2',49) f1
    inner join
    function2('tblp1_v2','tblp2_v2',49) f2 using(rn)



回答3:


For future reference:

Checking difference in number of rows:

SELECT 
ABS(count(f1a.*)-count(f2a.*))  
FROM
(SELECT f1.dist, row_number()  OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a FULL JOIN 

(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
USING (rn);

Checking difference in dist for same ordered rows:

SELECT 
COUNT(*)  

FROM

(SELECT f1.dist, row_number()  OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a 
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
WHERE f1a.rn=f2a.rn
AND f1a.distance <> f2a.distance;

A simple OVER() might also work since results of the functions are already ordered but is added for extra check.



来源:https://stackoverflow.com/questions/28808819/compare-result-of-two-table-functions-using-one-column-from-each

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!