问题
I'm building an application where my users can manage dictionaries. One feature is uploading a file to initialize or update the dictionary's content.
The part of the structure I'm focusing on for a start is Dictionary -[:CONTAINS]->Word
.
Starting from an empty database (Neo4j 1.9.4, but also tried 2.0.0M5), accessed via Spring Data Neo4j 2.3.1 in a distributed environment (therefore using SpringRestGraphDatabase, but testing with localhost), I'm trying to load 7k words in 1 dictionary. However I can't get it done in less than 8/9 minutes on a linux with core i7, 8Gb RAM and SSD drive (ulimit raised to 40000).
I've read lots of posts about loading/inserting performance using REST and I've tried to apply the advices I found but without better luck. The BatchInserter tool doesn't seem to be a good option to me due to my application constraints.
Can I hope to load 10k nodes in a matter of seconds rather than minutes ?
Here is the code I came up with, after all my readings :
Map<String, Object> dicProps = new HashMap<String, Object>();
dicProps.put("locale", locale);
dicProps.put("category", category);
Dictionary dictionary = template.createNodeAs(Dictionary.class, dicProps);
Map<String, Object> wordProps = new HashMap<String, Object>();
Set<Word> words = readFile(filename);
for (Word gw : words) {
wordProps.put("txt", gw.getTxt());
Word w = template.createNodeAs(Word.class, wordProps);
template.createRelationshipBetween(dictionary, w, Contains.class, "CONTAINS", true);
}
回答1:
I resolve such problem by just creating some CSV file and after that read it from Neo4j. It is needed to make such steps:
Write some class which get input data and base on it create CSV file (it can be one file per node kind or even you can create file which will be used to build relation).
In my case I have also create servlet which allow Neo4j to read that file by HTTP.
Create proper Cypher statements which allow to read and parse that CSV file. There are some samples of which I use (if you use Spring Data also remember about labels):
simple one:
load csv with headers from {fileUrl} as line merge (:UserProfile:_UserProfile {email: line.email})
more complicated:
load csv with headers from {fileUrl} as line match (c:Calendar {calendarId: line.calendarId}) merge (a:Activity:_Activity {eventId: line.eventId}) on create set a.eventSummary = line.eventSummary, a.eventDescription = line.eventDescription, a.eventStartDateTime = toInt(line.eventStartDateTime), a.eventEndDateTime = toInt(line.eventEndDateTime), a.eventCreated = toInt(line.eventCreated), a.recurringId = line.recurringId merge (a)-[r:EXPORTED_FROM]->c return count(r)
回答2:
Try the below
- Usw native Neo4j API rather than spring-data-neo4j while performing batch operations.
- Commit in batches i.e. may be for each 500 words
NOTE: There are certain properties (type) added by SDN which will be missing when using the native approach.
来源:https://stackoverflow.com/questions/19589687/neo4j-inserting-7k-nodes-is-slow-spring-data-neo4j-springrestgraphdatabase