Neo4j: Inserting 7k nodes is slow (Spring Data Neo4j / SpringRestGraphDatabase)

坚强是说给别人听的谎言 提交于 2019-12-11 04:04:55

问题


I'm building an application where my users can manage dictionaries. One feature is uploading a file to initialize or update the dictionary's content.

The part of the structure I'm focusing on for a start is Dictionary -[:CONTAINS]->Word. Starting from an empty database (Neo4j 1.9.4, but also tried 2.0.0M5), accessed via Spring Data Neo4j 2.3.1 in a distributed environment (therefore using SpringRestGraphDatabase, but testing with localhost), I'm trying to load 7k words in 1 dictionary. However I can't get it done in less than 8/9 minutes on a linux with core i7, 8Gb RAM and SSD drive (ulimit raised to 40000).

I've read lots of posts about loading/inserting performance using REST and I've tried to apply the advices I found but without better luck. The BatchInserter tool doesn't seem to be a good option to me due to my application constraints.

Can I hope to load 10k nodes in a matter of seconds rather than minutes ?

Here is the code I came up with, after all my readings :

Map<String, Object> dicProps = new HashMap<String, Object>();
dicProps.put("locale", locale);
dicProps.put("category", category);
Dictionary dictionary = template.createNodeAs(Dictionary.class, dicProps);
Map<String, Object> wordProps = new HashMap<String, Object>();
Set<Word> words = readFile(filename); 
for (Word gw : words) {
  wordProps.put("txt", gw.getTxt());
  Word w = template.createNodeAs(Word.class, wordProps);
  template.createRelationshipBetween(dictionary, w, Contains.class, "CONTAINS", true);
}

回答1:


I resolve such problem by just creating some CSV file and after that read it from Neo4j. It is needed to make such steps:

  1. Write some class which get input data and base on it create CSV file (it can be one file per node kind or even you can create file which will be used to build relation).

  2. In my case I have also create servlet which allow Neo4j to read that file by HTTP.

  3. Create proper Cypher statements which allow to read and parse that CSV file. There are some samples of which I use (if you use Spring Data also remember about labels):

    • simple one:

      load csv with headers from {fileUrl} as line 
         merge (:UserProfile:_UserProfile {email: line.email})
      
    • more complicated:

      load csv with headers from {fileUrl} as line 
           match (c:Calendar {calendarId: line.calendarId})
           merge (a:Activity:_Activity {eventId: line.eventId})
      on create set  a.eventSummary = line.eventSummary,
           a.eventDescription = line.eventDescription,
           a.eventStartDateTime = toInt(line.eventStartDateTime),
           a.eventEndDateTime = toInt(line.eventEndDateTime),
           a.eventCreated = toInt(line.eventCreated), 
           a.recurringId = line.recurringId
      merge (a)-[r:EXPORTED_FROM]->c
      return count(r)
      



回答2:


Try the below

  1. Usw native Neo4j API rather than spring-data-neo4j while performing batch operations.
  2. Commit in batches i.e. may be for each 500 words

NOTE: There are certain properties (type) added by SDN which will be missing when using the native approach.



来源:https://stackoverflow.com/questions/19589687/neo4j-inserting-7k-nodes-is-slow-spring-data-neo4j-springrestgraphdatabase

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!