I want to insert a few billions nodes and relationships to Neo4j. Using \"LOAD CSV\" is being cancelled after 30 min by the browser (Chrome) as the working memory is overloa
Hope this helps.
Regards, Tom
Before using the tool for gigantic datasets, I can suggest you few things I just learned importing millions of nodes in few minutes (Neo4j Community Edition for Windows).
Regarding Neo4j import tips:
Don't use the web interface to import such big datasets, memory overload is inevitable.
Instead, use a programming language to interact with Neo4j (I recently used the official Python module and it's simply to learn but you can do the same with the good-old Java).
Before using the LOAD CSV
, remember to write the USING PERIODIC COMMIT
instructions in order to import big sets of data each iteration.
Before importing relations from CSV, remember to use CREATE CONSTRAINT ON <...> ASSERT <...> IS UNIQUE
for the key-properties of your labels. It will have a huge impact on relationships creation.
Use MATCH(...)
, not CREATE(...)
for the relationship procedure. It will avoids duplicates.
Regarding Neo4j performance:
First of all: read the official Neo4j page for tuning performance: https://neo4j.com/docs/operations-manual/current/performance/
Set a proper memory configuration for your Windows machine: configure manually the dbms.memory.pagecache.size
parameter (in neo4j.conf
file), if necessary.
Remember: the Java Virtual Machine is not a black box; you can improve its performance specifically for your application (editing the neo4j-community.vmoptions file).
For example, you can set the max memory usage for the JVM (-Xmx
parameter), you can also set the -XX:+UseG1GC
parameter to using the G1 Garbage Collector (high performance, suggested by Oracle for production enviroment) (https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cnf_jvmgc.htm#autoId0)
I'll post my neo4j.conf custom lines used for my configuration (just for reference, it may be a wrong setup for your application, beware):
dbms.memory.pagecache.size=3g
dbms.jvm.additional=-XX:+UseG1GC
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow
dbms.jvm.additional=-XX:+AlwaysPreTouch
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields
dbms.jvm.additional=-XX:+DisableExplicitGC
And my neo4j-community.vmoptions custom lines (again, just for reference):
-Xmx1024m
-XX:+UseG1GC
-OmitStackTraceInFastThrow
-XX:+AlwaysPreTouch
-XX:+UnlockExperimentalVMOptions
-XX:+TrustFinalNonStaticFields
-XX:+DisableExplicitGC
My test machine is a weak notebook equipped with an Core i3 (dual core), with 8GB of RAM, Windows 10 and Neo4j 3.2.1 Community Edition.
I'm capable of importing 7 millions of nodes in less than 3 minutes and 3.5 millions of relationships in less than 5 minutes (no recursive relationships).
In a more capable machine, with a specific crafted setup, Neo4j can do WAY better than this. Hope it helps.