Directed graph SQL

徘徊边缘 提交于 2019-12-06 13:31:04

So your graph looks like this:

You can use Oracle's START WITH/CONNECT BY feature to do what you want. If we start at node GA, we can reach all nodes in the graph, as shown below.
CREATE TABLE edges (PARENT VARCHAR(100), CHILD VARCHAR(100));

insert into edges values ('AT', 'TG');
insert into edges values ('CG', 'GT');
insert into edges values ('GA', 'AT');
insert into edges values ('GC', 'CA');
insert into edges values ('GC', 'CG');
insert into edges values ('GG', 'GC');
insert into edges values ('GT', 'TG');
insert into edges values ('TG', 'GA');
insert into edges values ('TG', 'GC');
insert into edges values ('TG', 'GG');
COMMIT;

SELECT *
  FROM edges
START WITH CHILD = 'GA'
CONNECT BY NOCYCLE PRIOR CHILD = PARENT;

Output:

    PARENT  CHILD
1   TG      GA
2   GA      AT
3   AT      TG
4   TG      GC
5   GC      CA
6   GC      CG
7   CG      GT
8   CG      GT
9   GC      CA

NOTE Since your graph has cycles, it's important to use the NOCYCLE syntax on the CONNECT BY, otherwise this won't work.

EDITED ANSWER BASED ON LATEST EDITS BY OP

First of all, I assume that by "2 hops" you mean "at most 2 hops", because your current query is using level <= 2. If you want exactly 2 hops, it should be level = 2.

In your updated graph (image2.JPG), there is no path from AT to GT that takes 2 hops, so the query is returning what I would expect. From AT to GT, we can go AT->TG->GC->CG->GT, but that's 4 hops, which is greater than 2, so that's why you aren't getting that result back.

If you are expecting to be able to reach AT to GT in 2 hops, then you need to add an edge between TG and GT, like this:

INSERT INTO nodes VALUES('TG','GT');

Now when you run your query, you'll get this data back:

NODE_FROM NODE_TO AT TG TG GC TG GG TG GT

Remember that START WITH/CONNECT BY is going to only work if there is a path between the nodes. In your graph (before I added the new edge above), there is no path for AT->TG->GT, so that's why you're not getting the result back.

Now, if you added the edge TG->AT, then we would have the path GT->TG->AT. So in that case AT is 2 hops away from GT (i.e. we're going the reverse way now, starting from GT and ending at AT). But to find those paths, you would need to set START WITH node_from = 'GT'.

If your goal is to find all paths from a start node to any target node that is level <= 2 hops or less away, then the above should work.

However, if you want to all find all paths from some target node back to a source node (i.e. the reverse example I gave, from GT->TG->AT), then that's not going to work here. You'd have to run the query for all nodes in the graph.

Think of START WITH/CONNECT BY as doing a depth first search. It's going to go everywhere it can from a starting node. But it's not going to do any more than that.

Summary:

I think the query works fine, given the constraints above. I've explained why the GT-TG path is not returned, so I hope that makes sense.

Keep in mind, however, if you are trying to traverse reverse paths as well, you'll have to loop over every node and run the query, changing the START WITH node each time.

Sounds like you need to get a copy of Joe Celko's Trees and Hierarchies in SQL for Smarties.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!