Adding millions of nodes to neo4j spatial layer using cypher and apoc

半世苍凉 提交于 2020-01-04 06:50:28

问题


I have a data set of 3.8million nodes and I'm trying to load all of these into Neo4j spatial. The nodes are going into a simple point layer, so have the required latitude and longitude fields. I've tried:

MATCH (d:pointnode) 
WITH collect(d) as pn 
CALL spatial.addNodes("point_geom", pn) yield count return count

But this just keeps spinning without anything happening. I've also tried (I've been running the next query all on one line, but I've just split it up for ease of reading):

CALL apoc.periodic.iterate("MATCH (d:pointnode) 
WITH collect(d) AS pnodes return pnodes",
"CALL spatial.addNodes('point_geom', pnodes) YIELD count return count", 
{batchSize:10000, parallel:false, listIterate:true})

But again a lot of spinning and the occasional JAVA heap error.

The final approach I tried was to use FME with the HTTP caller, this works but is exceptionally slow so doesn't scale well for millions of nodes.

Any advice or suggestions would be much appreciated. Would apoc.periodic.commit or apoc.periodic.rock_n_roll be a better choice than periodic iterate?


回答1:


You have 3 800 000 nodes, you collect them in one list ... and then you do one call to have that list added to the layer ... that is going to take a while and eat loads of memory. apoc.periodic.iterate makes absolutely no difference because you only do one call to spatial.addNodes ...

It may take a while, but why not add them node by node ?

CALL apoc.periodic.iterate(
  "MATCH (d:pointnode) RETURN d",
  "CALL spatial.addNode('point_geom', d) YIELD node RETURN node"
  {batchSize:10000, parallel:false, listIterate:true})

Hope this helps (or at least explains why you are having issues).

Regards, Tom




回答2:


After a bit of trial and error periodic commit has led to a relatively quick solution (still going to take 2-3 hours)

call apoc.periodic.commit("match (n:pointnode) 
where not (n)-[:RTREE_REFERENCE]-() with n limit {limit} 
WITH collect(n) AS pnodes 
CALL spatial.addNodes('point_geom', pnodes) YIELD count return count",
{limit:1000})

May be quicker with larger batch sizes

EDIT with a batch size of 5000 it takes 45 minutes



来源:https://stackoverflow.com/questions/45904154/adding-millions-of-nodes-to-neo4j-spatial-layer-using-cypher-and-apoc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!