Neo4j query for shortest path stuck (Do not work) if I have 2way relationship in graph nodes and nodes are interrelated

后端未结

关注

 3  1098

面向向阳花 2020-12-19 19:41

I made relation graph two relationship, like if A knows B then B knows A, Every node has unique Id and Name along with other properties.. So my graph looks like

3条回答

一个人的身影 (楼主)

2020-12-19 19:42
im afraid you wont be able to do much here. your graph is very specific, having a relation only to closest nodes. thats too bad cause neo4j is ok to play around the starting point +- few relations away, not over whole graph with each query

it means, once, you are 2 nodes away, the computational complexity raises up to:
```
8 relationships per node
distance 2
8 + 8^2
```
in general, the top complexity for a distance n is
```
O(8 + 8^n) //in case all affected nodes have 8 connections
```
you say, you got like ~80 000 of nodes.this means (correct me if im wrong), the longest distance of ~280 (from √80000). lets suppose your nodes
```
(p1:SearchableNode {name: "Ishaan"}), 
(p2:SearchableNode {name: "Garima"}),
```
to be only 140 hopes away. this will create a complexity of 8^140 = 10e126, im not sure if any computer in the world can handle this.

sure, not all nodes have 8 connections, only those "in the middle", in our example graph it will have ~500 000 relationships. you got like ~300 000, which is maybe 2 times less so lets supose the overal complexity for an average distance of 70 (out of 140 - a very relaxed bottom estimation) for nodes having 4 relationships in average (down from 8, 80 000 *4 = 320 000) to be
```
O(4 + 4^70) = ~10e42
```
one 1GHz CPU should be able to calculate this by:
```
-1000 000 per second
10e42 == 10e36 * 1 000 000 -> 10e36 seconds
```
lets supose we got a cluster of 100 10Ghz cpu serves, 1000 GHz in total. thats still 10e33 * 1 000 000 000 -> 10e33seconds

i would suggest to just keep away from AllshortestPaths, and look only for the first path available. using gremlin instead of cypher it is possible to implement own algorithms with some heuristics so actually you can cut down the time to maybe seconds or less.

exmaple: using one direction only = down to 10e16 seconds.

an example heuristic: check the id of the node, the higher the difference (subtraction value) between node2.id - node1.id, the higher the actual distance (considering the node creation order - nodes with similar ids to be close together). in that case you can either skip the query or just jump few relations away with something like MATCH n1-[:RELATED..5]->q-[:RELATED..*]->n2 (i forgot the syntax of defining exact relation count) which will (should) actually jump (instantly skip to) 5 distances away nodes which are closer to the n2 node = complexity down from 4^70 to 4^65. so if you can exactly calculate the distance from the nodes id, you can even match ... [:RELATED..65] ... which will cut the complexity to 4^5 and thats just matter of miliseconds for cpu.

its possible im completely wrong here. it has been already some time im our of school and would be nice to ask a mathematician (graph theory) to confirm this.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...