PostgreSQL pass data from recursive CTE onto function

孤街浪徒 提交于 2019-11-28 00:29:22
Michael Buen

You can make the searching of the path more efficient if you start at the bottom. Start at the children. If you start at the parent, it entails traversing all the children; whereas if you searched from the child, it has only one parent, hence won't waste time finding the path between source and target.

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source
),
construct_path(source, target, recentness, path) as
(
  select source, target, recentness, source || '.' || target
  from find_parent 
  where recentness = (select max(recentness) from find_parent)

  union

  select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
  from find_parent dd
  join construct_path cp on dd.recentness = cp.recentness - 1  
)
select source, target, path 
from construct_path
order by recentness desc

Output:

SOURCE   TARGET   PATH
1        2        1.2
2        4        1.2.4
4        9        1.2.4.9

Live test: http://www.sqlfiddle.com/#!1/13e6b/1

Similar to this: How to get the parent given a child in SQL SERVER 2005


This is optimized, cut recursion to parent if it already find the specific one(source).

Source = 2

Target = 9

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
         -- despite the name, this target is another one's source
         and i.target <> 2
)
,construct_path(source, target, recentness, path) as
(
    select source, target, recentness, source || '.' || target
    from find_parent 
    where recentness = (select max(recentness) from find_parent)

    union

    select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
    from find_parent dd
    join construct_path cp on dd.recentness = cp.recentness - 1  

)
select source, target, path
from construct_path
order by recentness desc

Output:

SOURCE   TARGET  PATH
2        4       2.4
4        9       2.4.9

Live test: http://www.sqlfiddle.com/#!1/13e6b/16

Upon re-reading the OP's question, I come up with this solution:

source

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 4 as d_target )
,traverse_path(filter, source, target, path, bingo) as
(
  select bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y,

    max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter

  union

  select tp.filter, bp.node_x, bp.node_y, path || '.' || node_y,   

      max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool   

  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp cross join inputs i    
-- if Postgresql has Oracle KEEP windowing, 
-- we don't need to use WHERE clause here     
where 
  not tp.bingo   
  or
  (
    tp.bingo and tp.target = d_target
  )

The above WHERE clause could be shortened to:

  WHERE not bingo or target = 4

Output:

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

Live test: http://www.sqlfiddle.com/#!1/cdde6/55


Here's the result for Source = 2, Target = 5 :

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1

Data:

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1);
Erwin Brandstetter

Temporary table for testing:

CREATE TEMP TABLE best_path (node_x int, node_y int, strength int);
INSERT INTO best_path  VALUES
 (1, 2,  1)
,(1, 3,  1)
,(2, 4,  1)
,(2, 5,  1)
,(3, 6,  1)
,(3, 7,  1)
,(4, 8,  1)
,(4, 9,  1)
,(5, 10, 1)
,(5, 11, 1);

Query:
modified to accomodate the comment about 2 - 5

WITH RECURSIVE t AS (    -- for readability and convenience:
    SELECT 1 AS node_s   -- source_node
         , 4 AS node_t   -- target_node
    )
    , x AS (
    SELECT node_x
    FROM   t, best_path
    WHERE  node_y = node_t
    )
    , a AS  ( 
    SELECT b.node_x
         , b.node_y
         , b.strength AS weight
         , b.node_x || '.' || b.node_y || '.' AS path
    FROM   t, best_path b
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  b.node_x = node_s
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node

    UNION ALL
    SELECT a.node_x
         , b.node_y
         , least(a.weight, b.strength)
         , a.path || b.node_y || '.' AS path
    FROM   t, a
    JOIN   best_path AS b ON b.node_x = a.node_y
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  a.node_y <> node_t                   -- arrived at target
    AND    a.path !~~ ('%' || b.node_y || '.%') -- not visited yet
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node
    )
TABLE a;

Produces the requested result exactly.

I added the break condition to the initial query, too, because we can reach the target in only one step.

This happens to be much like my answer to a previous, similar question. All the explanation and links apply. The major additional trick here is to include an additional CTE x to collect the nodes that ...

have (a) direct edge to target.

For repeated use in the break condition of the recursive CTE. It is not widely known that you can add additional (non-recursive) CTEs on top of a recursive CTE regardless of the keyword RECURSIVE.

Optimized solution, no more WHERE clause on final result; albeit Postgresql-specific solution, i.e. we uses DISTINCT ON in order to pick specific row :

Given this data:

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1),
    (5, 12, 1);

Query, first stage, shows behind the scenes (source = 1, target = 11) :

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool            
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool        
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp 

Output for source = 1, target = 11 : http://www.sqlfiddle.com/#!1/db290/56

FILTER   SOURCE   TARGET   PATH     BINGO    DISTINCTER
1        1        2        1.2      0        2
1        1        3        1.3      0        3
1        2        4        1.2.4    0        4
1        2        5        1.2.5    0        5
1        3        6        1.3.6    0        6
1        3        7        1.3.7    0        7
1        4        8        1.2.4.8  0        8
1        4        9        1.2.4.9  0        9
1        5        11       1.2.5.11 1        1
1        5        10       1.2.5.10 1        1
1        5        12       1.2.5.12 1        1

As we can see, the target 11, is the first row among the source of 5. How this did happen? On our DISTINCTER column, we uses ORDER BY on MAX, this is one of the rare occasions that an ORDER on MAX windowing makes sense. We just use it in order to sort our result. I tried placing the ORDER BY at the end of the query, but the database complain that it cannot use ORDER on CTE. CTE forbids placing an ORDER BY clause. But as we all know, we can influence the sorting on OVER() clause, so that's how our results get sorted. In the result above, among the source of 5, the number 11 sorts before 10 and 12, as 11 is our target. So when we do a DISTINCT ON (Postgresql-specific feature), Postgres will pickup that first line, hence it will pickup the target 11.


This is our final query, optimized, albeit Postgresql-specific:

with recursive -- this denotes that at least one CTE is using recursion    
inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo) as (
  select 
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )          
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter
  union
  select       
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )    
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
) select tp.* from traverse_path tp

Output for source = 1, target = 11 http://www.sqlfiddle.com/#!1/db290/55

FILTER   SOURCE   TARGET   PATH     BINGO
1        1        2        1.2      0
1        1        3        1.3      0
1        2        4        1.2.4    0
1        2        5        1.2.5    0
1        3        6        1.3.6    0
1        3        7        1.3.7    0
1        4        8        1.2.4.8  0
1        4        9        1.2.4.9  0
1        5        11       1.2.5.11 1

Output for source = 1, target = 4 : http://www.sqlfiddle.com/#!1/db290/53

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

Output for source = 2, target = 5 : http://www.sqlfiddle.com/#!1/db290/54

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1


Another approach, instead of BINGO approach. Use continue_search as the logic for continuing the travesal. And use EVERY aggregate function to determine if we need to continue traversing the graph.

Here's how it works: http://www.sqlfiddle.com/#!1/db290/84

Final query: http://www.sqlfiddle.com/#!1/db290/85

It's interesting to note that EVERY is very English-like as it gets:

Contrast this:

,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

With using EVERY :

,every(bp.node_y <> i.d_target) over(partition by bp.node_x)

Which one is easier to read?

Here's the output that illustrates the principle of using EVERY to facilitate DISTINCT ON:

Source = 1, Target = 5. Note that 5 sorts first before other numbers that belongs to the same source, this will later be picked on by DISTINCT ON.

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH     DISTINCTER
1         1         2         1.2       1                   2
1         1         3         1.3       1                   3
1         2         5         1.2.5     0                   0
1         2         4         1.2.4     0                   0
1         3         6         1.3.6     1                   6
1         3         7         1.3.7     1                   7

Here's the query that does that principle:

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
      ,coalesce(        
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)    
      ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp 

Final query:

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search) as
(
  select                      
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )    
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select   
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )  
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp 

Output:

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH
1         1         2         1.2       1
1         1         3         1.3       1
1         2         5         1.2.5     0
1         3         6         1.3.6     1
1         3         7         1.3.7     1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!