Inserting large number of nodes into Neo4J

我怕爱的太早我们不能终老 提交于 2019-12-01 01:14:26

In case anyone else here runs into this problem I want to document what myself and a coworker were able to figure out in order to increase speed. First off a note or two about the data:

  • There were a large number of users they accounted for roughly 30% of the nodes
  • There were also a large number of hashtags as people will tend to hash just about anything
  • Both of these had to be guaranteed unique

Now that that's out of the way on to the optimizations. First and formost you need to ensure that your insert loop completes each time a node is inserted. There were no real examples of this for us to look at so intially the code looked like this (pseudo code)

Transaction begin
While(record.next()){
   parse record
   create unique user
   create unique hashtag
   create comment
   insert into graph
}
Transaction success
Transaction finish

While this worked ok and finished relatively quickly for small datasets it didn't scale well. So we took a look at the purpose of each function and refactored the code to look like the following:

While(record.next()){
   Transaction begin

   parse record
   create unique user
   create unique hashtag
   create comment
   insert into graph

   Transaction success
   Transaction finish
}

This greatly sped things up but it wasn't enough for my co-worker. So he found that Lucene indexes could be created on node attributes and that we could reference those in the Unique Node factory. This gave us another significant speed boost. So much so that we could insert 1,000,000 nodes in ~10 seconds without resorting to using the batch loader. Thanks to everyone for their help.

Why not create a local cache during the batch insert? You can use a java Map with Key name and Value NodeId(From the batch inserter).

Usually it is simplest by just keeping them in a HashMap. You won't have that many users and tags after all

You can also use the LuceneBatchInserterIndex and use setCapacity

see: http://docs.neo4j.org/chunked/milestone/batchinsert.html#indexing-batchinsert

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!