Inserting large number of nodes into Neo4J

痴心易碎 提交于 2019-11-30 20:43:29

问题


I have a table stored in a typical MySQL database and I've built a small parser tool using java, to parse out and build a neo4j database. This database will have ~40 million nodes, each with one or more edges (with a possible maximum of 10 edges). The problem comes from the way I have to create certain nodes. There's a user node, comment node, and hashtag node. The user nodes and hashtag nodes must each be unique. I'm using code from the following example to ensure uniqueness:

public Node getOrCreateUserWithUniqueFactory( String username, GraphDatabaseService graphDb )
{
    UniqueFactory<Node> factory = new UniqueFactory.UniqueNodeFactory( graphDb, "users" )
    {
    @Override
    protected void initialize( Node created, Map<String, Object> properties )
    {
        created.setProperty( "name", properties.get( "name" ) );
    }
};

return factory.getOrCreate( "name", username );

}

I have thought about using the batch inserter but I haven't seen a way to check if a node is unique while performing a batch insert. So my question is what is the fastest way to insert all these nodes while still ensuring that they retain their uniqueness. Any help would as always be greatly appreciated.


回答1:


In case anyone else here runs into this problem I want to document what myself and a coworker were able to figure out in order to increase speed. First off a note or two about the data:

  • There were a large number of users they accounted for roughly 30% of the nodes
  • There were also a large number of hashtags as people will tend to hash just about anything
  • Both of these had to be guaranteed unique

Now that that's out of the way on to the optimizations. First and formost you need to ensure that your insert loop completes each time a node is inserted. There were no real examples of this for us to look at so intially the code looked like this (pseudo code)

Transaction begin
While(record.next()){
   parse record
   create unique user
   create unique hashtag
   create comment
   insert into graph
}
Transaction success
Transaction finish

While this worked ok and finished relatively quickly for small datasets it didn't scale well. So we took a look at the purpose of each function and refactored the code to look like the following:

While(record.next()){
   Transaction begin

   parse record
   create unique user
   create unique hashtag
   create comment
   insert into graph

   Transaction success
   Transaction finish
}

This greatly sped things up but it wasn't enough for my co-worker. So he found that Lucene indexes could be created on node attributes and that we could reference those in the Unique Node factory. This gave us another significant speed boost. So much so that we could insert 1,000,000 nodes in ~10 seconds without resorting to using the batch loader. Thanks to everyone for their help.




回答2:


Why not create a local cache during the batch insert? You can use a java Map with Key name and Value NodeId(From the batch inserter).




回答3:


Usually it is simplest by just keeping them in a HashMap. You won't have that many users and tags after all

You can also use the LuceneBatchInserterIndex and use setCapacity

see: http://docs.neo4j.org/chunked/milestone/batchinsert.html#indexing-batchinsert



来源:https://stackoverflow.com/questions/14970513/inserting-large-number-of-nodes-into-neo4j

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!