问题
I am finding Neo4j slow to add nodes and relationships/arcs/edges when using the REST API via py2neo for Python. I understand that this is due to each REST API call executing as a single self-contained transaction.
Specifically, adding a few hundred pairs of nodes with relationships between them takes a number of seconds, running on localhost.
What is the best approach to significantly improve performance whilst staying with Python?
Would using bulbflow and Gremlin be a way of constructing a bulk insert transaction?
Thanks!
回答1:
There are several ways to do a bulk create with py2neo, each making only a single call to the server.
- Use the
create
method to build a number of nodes and relationships in a single batch. - Use a cypher CREATE statement.
- Use the new
WriteBatch
class (just released this week) to manually make a batch of nodes and relationships (this is really just a manual version of 1).
If you have some code, I'm happy to look at it and make suggestions on performance tweaks. There are also quite a few tests you may be able to get inspiration from.
Cheers, Nige
回答2:
Neo4j's write performance is slow unless you are doing a batch insert.
The Neo4j batch importer (https://github.com/jexp/batch-import) is the fastest way to load data into Neo4j. It's a Java utility, but you don't need to know any Java because you're just running the executable. It handles typed data and indexes, and it imports from a CSV file.
To use it with Bulbs (http://bulbflow.com/) Models, use the model get_bundle()
method to get the data, index name, and index keys, which is prepared for insert, and then output the data to a CSV file. Or if you don't want to model your data, just output your data from Python to the CSV file.
Will that work for you?
回答3:
There's so many old answers to this question online, that it took me forever to realize there's an import tool that comes with neo4j. It's very fast and the best tool I was able to find.
Here's a simple example if we want to import student nodes:
bin/neo4j-import --into [path-to-your-neo4j-directory]/data/graph.db --nodes students
The students file contains data that looks like this, for example:
studentID:Id(Student),name,year:int,:LABEL
1111,Amy,2000,Student
2222,Jane,2012,Student
3333,John,2013,Student
Explanation:
- The header explains how the data below it should be interpreted.
- studentID is a property with type Id(Student).
- name is of type string which is the default.
- year is an integer
- :LABEL is the label you want for these nodes, in this case it is "Student"
Here's the documentation for it: http://neo4j.com/docs/stable/import-tool-usage.html
Note: I realize the question specifically mentions python, but another useful answer mentions a non-python solution.
回答4:
Well, I myself had need for massive performance from neo4j. I end up doing following things to improve graph performance.
- Ditched py2neo, since there were lot of issues with it. Besides it is very convenient to use REST endpoint provided by neo4j, just make sure to use request sessions.
- Use raw cypher queries for bulk insert, instead of any OGM(Object-Graph Mapper). That is very crucial if you need an high-performant system.
- Performance was not still enough for my needs, so I ended writing a custom system that merges 6-10 queries together using WITH * AND UNION clauses. That improved performance by a factor of 3 to 5 times.
- Use larger transaction size with atleast 1000 queries.
回答5:
To insert a bulk of nodes in very high speed to Neo4K
Batch Inserter
http://neo4j.com/docs/stable/batchinsert-examples.html
In my case I'm working on Java.
来源:https://stackoverflow.com/questions/12643662/fastest-way-to-perform-bulk-add-insert-in-neo4j-with-python